Data Protection¶
HCP provides multiple mechanisms for protecting data against loss, corruption, and unauthorized modification. Each mechanism addresses a different concern — understanding when to use each one is key to designing a robust data management strategy.
graph TD
subgraph "What could go wrong?"
DISK["Disk failure"]
SITE["Site goes down"]
HUMAN["Accidental deletion"]
LEGAL["Regulatory audit"]
end
subgraph "HCP Protection Mechanisms"
SP["Service Plans<br/>copy count + placement"]
EC["Erasure Coding<br/>fragment distribution"]
REPL["Replication<br/>geo-redundancy"]
RET["Retention<br/>deletion prevention"]
COMP["Compliance Mode<br/>irrevocable protection"]
end
DISK --> SP
SITE --> EC
SITE --> REPL
HUMAN --> RET
LEGAL --> COMP
See HCP Concepts for the introductory overview of these mechanisms.
Erasure Coding¶
Erasure coding is HCP's method for distributing data across multiple geographically separated HCP systems with lower storage overhead than full replication.
How It Works¶
The core idea is simple: instead of keeping complete copies of every object on multiple systems (which doubles or triples storage), erasure coding breaks each object into smaller data fragments and adds parity fragments for redundancy. Any subset of the fragments can reconstruct the original object — so if one system fails, nothing is lost.
Think of it like a RAID array, but across geographically separated data centers instead of disks in a single server.
graph LR
subgraph "Traditional Replication (3x storage)"
R_OBJ["100 MB Object"]
R_OBJ --> R1["Full Copy<br/>System A<br/>100 MB"]
R_OBJ --> R2["Full Copy<br/>System B<br/>100 MB"]
R_OBJ --> R3["Full Copy<br/>System C<br/>100 MB"]
end
graph LR
subgraph "Erasure Coding (~1.5x storage)"
E_OBJ["100 MB Object"]
E_OBJ --> EC["Split + Encode"]
EC --> E1["Fragment 1<br/>System A<br/>~33 MB"]
EC --> E2["Fragment 2<br/>System B<br/>~33 MB"]
EC --> E3["Fragment 3<br/>System C<br/>~33 MB"]
EC --> EP["Parity<br/>System D<br/>~33 MB"]
end
The trade-off: reads are slower with erasure coding because fragments must be fetched from multiple systems and reassembled. Replication gives instant reads from any copy.
Topologies¶
Erasure coding topologies define which HCP systems participate and how they connect to each other. The topology determines how fragments are distributed and what happens when systems fail.
Fully Connected¶
Every system has a direct replication link to every other system. This provides the highest redundancy and fastest recovery, but requires more network connections.
graph TD
A["System A"] <--> B["System B"]
A <--> C["System C"]
A <--> D["System D"]
B <--> C
B <--> D
C <--> D
Ring¶
Systems form a ring where each connects to exactly two neighbors. This requires fewer network links but provides less redundancy — a break in the ring can isolate systems.
graph LR
A["System A"] <--> B["System B"]
B <--> C["System C"]
C <--> D["System D"]
D <--> A
Both topologies support 3–6 systems. Choose fully connected when maximum availability matters most; choose ring when minimizing network infrastructure is the priority.
Key Settings¶
| Setting | Range | Description |
|---|---|---|
fullCopy |
boolean | When true, each system keeps a complete copy plus fragments. When false (default), only fragments are distributed — lower storage but slower recovery. |
erasureCodingDelay |
0–3,650 days | Days after object creation before fragments are distributed. Allows frequently-accessed new objects to stay local before fragmenting. |
restorePeriod |
0–180 days | Days to keep a restored object local before re-fragmenting. |
minimumObjectSize |
4,096–1,048,576 bytes | Only objects above this size are erasure-coded. Small objects aren't worth fragmenting due to the overhead. |
Topology Status¶
stateDiagram-v2
[*] --> HEALTHY: All systems up
HEALTHY --> VULNERABLE: System goes down
VULNERABLE --> HEALTHY: System recovered
VULNERABLE --> BROKEN: Too many systems down
BROKEN --> VULNERABLE: System recovered
HEALTHY --> RETIRING: Admin initiates decommission
RETIRING --> RETIRED: Decommission complete
| Status | Meaning |
|---|---|
HEALTHY |
All systems operational, fragments distributed correctly. |
VULNERABLE |
At least one system down — data still readable but not fully protected. |
BROKEN |
Too many systems down — some objects may be unreadable. |
RETIRING |
Topology is being decommissioned. |
RETIRED |
Topology has been fully decommissioned. |
Namespace-Level Control¶
Erasure coding is enabled per namespace with allowErasureCoding. Two prerequisites must be met:
- The namespace must be cloud-optimized (
optimizedFor: CLOUD) - The tenant must have
erasureCodingSelectionEnabledset totrue
Erasure Coding vs Replication¶
| Aspect | Erasure Coding | Replication |
|---|---|---|
| Storage overhead | ~1.3–1.5x | 2–3x |
| Read performance | Slower (assemble fragments) | Fast (read any copy) |
| Write performance | Slower (compute + distribute) | Fast (write + replicate) |
| Recovery speed | Slower (reconstruct) | Fast (copy exists) |
| Best for | Large, infrequently accessed data | Hot data needing fast reads |
Service Plans¶
Service plans define the data protection and placement strategy for objects in a namespace. They replace the deprecated DPL (Data Protection Level) system.
graph TD
ADMIN["System Administrator"]
ADMIN -->|"defines"| SP1["Service Plan: Default"]
ADMIN -->|"defines"| SP2["Service Plan: Platinum"]
ADMIN -->|"defines"| SP3["Service Plan: Short-Term"]
ADMIN -->|"grants access"| T["Tenant"]
T -->|"assigns to"| NS1["Namespace A<br/>→ Default"]
T -->|"assigns to"| NS2["Namespace B<br/>→ Platinum"]
A service plan specifies:
- Copy count — how many copies of each object to maintain
- Placement — where copies are stored (which nodes, which storage tiers)
- Protection method — whether to use erasure coding or replication
- Performance — I/O priority and tiering behavior
Service plans are defined at the system level by HCP administrators. Tenants can assign available plans to their namespaces if servicePlanSelectionEnabled is true on the tenant. Each tenant sees only the plans the system administrator has made accessible to it.
The legacy dpl property on namespaces now always returns "Dynamic" and isDplDynamic always returns true. All data protection is managed through service plans.
Compliance Modes¶
HCP namespaces operate in one of two compliance modes. The choice fundamentally affects what operations are allowed on retained objects — and the switch from enterprise to compliance mode is permanent.
Enterprise Mode vs Compliance Mode¶
graph LR
ENT["Enterprise Mode<br/>(default)"]
COMP["Compliance Mode<br/>(irrevocable)"]
ENT -->|"one-way switch"| COMP
COMP -.-x|"cannot go back"| ENT
| Operation | Enterprise Mode | Compliance Mode |
|---|---|---|
| Privileged delete (remove objects under retention) | Allowed (requires PRIVILEGED permission) |
Prohibited |
| Shorten retention period | Allowed | Prohibited |
| Delete retention classes in use | Allowed | Prohibited |
| Extend retention period | Allowed | Allowed |
| Change from this mode to the other | Can switch to Compliance | Cannot switch back |
Enterprise mode is the default for new namespaces. Once a namespace is switched to compliance mode, it cannot be reverted. This ensures that regulatory commitments made with compliance mode are irrevocable — even system administrators cannot circumvent the retention rules.
Retention Type¶
Each namespace has a retentionType that determines which retention mechanism is used:
HCP— traditional HCP retention with offset values, retention classes, and privileged deleteS3— S3 Object Lock (Governance and Compliance modes). Choosing this automatically enables versioning, delete markers, and cloud-optimized protocols.
Retention Deep Dive¶
Retention is HCP's mechanism for preventing premature deletion of objects. Understanding the different ways to specify retention is essential for compliance workflows.
Object Lifecycle¶
graph LR
INGEST["Object Ingested"]
RETAINED["Under Retention<br/>(cannot delete)"]
EXPIRED["Retention Expired"]
DISP["Disposition<br/>(auto-delete)"]
SHRED["Shredding<br/>(crypto-destroy)"]
DELETED["Deleted"]
INGEST -->|"retention set"| RETAINED
RETAINED -->|"time passes"| EXPIRED
EXPIRED -->|"disposition enabled"| DISP
EXPIRED -->|"shredding enabled"| SHRED
EXPIRED -->|"manual delete"| DELETED
DISP --> DELETED
SHRED --> DELETED
Retention Value Formats¶
HCP supports four formats for specifying retention. Each serves a different use case:
1. Special values — for simple permanent or temporary states:
| Value | Name | Meaning |
|---|---|---|
0 |
Deletion Allowed | Object can be deleted anytime. |
-1 |
Deletion Prohibited | Object can never be deleted (except privileged delete in enterprise mode). |
-2 |
Initial Unspecified | Prevents deletion but allows retention to be set later. Think of it as a "pending review" state. |
2. Offset values — relative to the object's ingest time, for policies like "keep for 7 years":
A+7y → 7 years after ingest
A+100y → 100 years after ingest
A+2y+1d → 2 years and 1 day after ingest
A+20d-5h → 20 days minus 5 hours after ingest
The format regex is A([+-]\d+y)?([+-]\d+M)?([+-]\d+w)?([+-]\d+d)?([+-]\d+h)?([+-]\d+m)?([+-]\d+s)?$. Units must appear in largest-to-smallest order. Case matters: uppercase M for months, lowercase m for minutes.
3. Retention class reference — C+class-name (e.g., C+HlthReg-107). The actual retention period is defined by the class and can be updated centrally, affecting all objects that reference it.
4. Fixed date — either epoch seconds or ISO 8601 format yyyy-MM-ddThh:mm:ssZ. Note that HCP auto-adjusts invalid dates rather than rejecting them (e.g., November 33 becomes December 3).
Retention Classes¶
Retention classes are named policies defined at the namespace level. They centralize retention management — instead of setting individual dates on each object, you assign a class name and update the class when policies change.
<retentionClass>
<name>FN-Std-42</name>
<description>Finance department standard #42 - keep for 10 years</description>
<value>A+10y</value>
<allowDisposition>true</allowDisposition>
</retentionClass>
When allowDisposition is true, objects assigned to this class are automatically deleted when their retention expires (if disposition is enabled on the namespace).
Shredding¶
When shreddingDefault is enabled on a namespace's compliance settings, objects are cryptographically destroyed after their retention expires rather than simply deleted. The storage areas where the object data resided are overwritten with random data, making recovery impossible even with physical access to the storage media.
This is required for highly sensitive data where normal deletion (which may leave data recoverable through forensic techniques) is insufficient.
Disposition¶
Disposition is the automatic deletion of objects after their retention expires. It operates as a background service and requires three conditions:
graph TD
D1["dispositionEnabled = true<br/>on namespace"]
D2["Retention class allows disposition<br/>(allowDisposition = true)"]
D3["Retention period has expired"]
D1 --> CHECK{"All conditions met?"}
D2 --> CHECK
D3 --> CHECK
CHECK -->|"Yes"| AUTO["Auto-delete by<br/>DispositionPolicy service"]
CHECK -->|"No"| KEEP["Object preserved"]
Replication Deep Dive¶
Replication copies data between geographically separated HCP systems for disaster recovery and high availability. Unlike erasure coding (which distributes fragments), replication keeps complete copies of objects on each system.
Link Types¶
graph LR
subgraph "Active/Active (bidirectional)"
AA1["System A<br/>reads + writes"] <-->|"replicate both ways"| AA2["System B<br/>reads + writes"]
end
graph LR
subgraph "Active/Passive (unidirectional)"
AP1["Primary<br/>reads + writes"] -->|"replicate one way"| AP2["Secondary<br/>read-only"]
end
| Type | Direction | Description |
|---|---|---|
ACTIVE_ACTIVE |
Bidirectional | Both systems accept writes. Changes replicate in both directions. Both sides can read and write simultaneously. |
ACTIVE_PASSIVE (outbound) |
Unidirectional | Local system sends data to remote. Remote is read-only. Used for disaster recovery. |
ACTIVE_PASSIVE (inbound) |
Unidirectional | Remote system sends data to local. Local receives only. |
Active/active links support separate local and remote schedules. Active/passive links have a single schedule. Active/passive links can be chained (the passive end of one link can feed into another), but active/active links cannot be chained.
Replication Schedules¶
Replication bandwidth is managed through performance levels on time-based schedules. This lets administrators balance replication throughput against production workloads:
| Level | Description |
|---|---|
HIGH |
Maximum bandwidth for replication. |
MEDIUM |
Balanced bandwidth. |
LOW |
Minimal bandwidth (background replication). |
CUSTOM |
Custom bandwidth limit. |
OFF |
No replication during this period (cannot set the entire week to OFF). |
Schedules use transitions: at a specific day and hour (e.g., Sun:00), the performance level changes. A typical schedule might use LOW during business hours (when production traffic is heavy) and HIGH overnight and on weekends (when the network is quiet).
Collision Handling¶
When two sites in an active/active link modify the same object at the same time, a collision occurs. This is inherent to active/active architectures — two users at different sites can edit the same object before replication has time to sync.
sequenceDiagram
participant A as System A
participant B as System B
Note over A,B: Same object exists on both systems
A->>A: User updates object (v2-A)
B->>B: User updates object (v2-B)
A->>B: Replicate v2-A
B->>A: Replicate v2-B
Note over A,B: Collision detected!
A->>A: Deterministic winner selected
B->>B: Same winner selected
Note over A,B: Loser moved to .lost+found<br/>or renamed with .collision suffix
HCP uses a deterministic algorithm to pick the winner — the same version wins on both sides, ensuring consistency across systems. The losing version is preserved so administrators can manually reconcile if needed.
| Setting | Options | Description |
|---|---|---|
action |
MOVE or RENAME |
MOVE puts the losing version in .lost+found. RENAME appends .collision to the losing version's name. |
deleteEnabled |
boolean | Whether to auto-delete collision artifacts after a set period. |
deleteDays |
0–36,500 | Days to keep collision artifacts before auto-deleting them. |
Failover and Recovery¶
When a system becomes unreachable, HCP can automatically redirect traffic to the surviving system.
sequenceDiagram
participant C as Clients
participant A as System A (Primary)
participant B as System B (Replica)
C->>A: Normal operations
Note over A: System A goes down
Note over A,B: autoFailoverMinutes passes (default: 120)
B->>B: DNS updated to redirect traffic
C->>B: Traffic redirected to System B
Note over A: System A recovers
A->>B: Failback: sync changes made during outage
B->>A: Replicate back
C->>A: Traffic restored to System A
| Setting | Description |
|---|---|
autoFailover |
Automatically redirect DNS when the remote system becomes unreachable. |
autoFailoverMinutes |
How long to wait before triggering failover (7–9,999 minutes, default: 120). |
autoCompleteRecovery |
For active/passive links, automatically complete recovery when the primary comes back. |
autoCompleteRecoveryMinutes |
Time threshold for auto-recovery. |
Read from Replica¶
When readFromReplica is enabled on a namespace, read requests can be served from any replica — not just the system the client connected to. This is transparent to the client and improves read performance by serving data from the geographically nearest copy.
graph TD
CLIENT["Client in Stockholm"]
SYS_A["System A<br/>Stockholm"]
SYS_B["System B<br/>Frankfurt"]
CLIENT -->|"request object"| SYS_A
SYS_A -->|"object found locally"| CLIENT
CLIENT2["Client in Munich"]
CLIENT2 -->|"request same object"| SYS_B
SYS_B -->|"served from local replica"| CLIENT2