Deployment¶
Containerization¶
The project uses Dagger for reproducible container builds and CI/CD pipelines (see dagger.json and .dagger/). A docker-compose.yml is also provided for local multi-service development:
docker compose -f .docker/docker-compose.yml up
This starts the backend, frontend, and Redis together with health checks and automatic service linking.
Mock Server¶
For development without an HCP system, the backend includes a mock server:
graph LR
subgraph "Mock Server"
DISP["mapi_state.py<br/>Request dispatcher"]
FIX["fixtures.py<br/>Seed data"]
STATE["In-memory state<br/>Dict-based storage"]
end
API3["FastAPI"] -->|"same interface as<br/>MapiService"| DISP
FIX -->|"initial data"| STATE
DISP -->|"CRUD operations"| STATE
The mock server implements the same interface as the real MAPI service, allowing the frontend to be developed and tested independently. Start it with make run-api-mock.
Publishing Container Images¶
The project uses a Dagger pipeline (.dagger/publish.go) to build and push images to Docker Hub. Three Make targets are available:
# Publish both backend and frontend
make publish TAG=v0.1.0
# Publish individually
make publish-backend TAG=v0.1.0
make publish-frontend TAG=v0.1.0
Credentials are read from .env:
| Variable | Description |
|---|---|
DOCKER_USERNAME |
Docker Hub username |
DOCKER_PASSWORD |
Docker Hub password or access token |
Published images:
| Image | Default repository |
|---|---|
| Backend | riksarkivet/ra-hcp |
| Frontend | riksarkivet/ra-hcp-frontend |
Helm Chart¶
A Helm chart is provided in charts/helm-ra-hcp-v0.1.0/ for Kubernetes deployment. Install with:
helm install ra-hcp charts/helm-ra-hcp-v0.1.0 \
--set env.HCP_DOMAIN=hcp.example.com \
--set secret.API_SECRET_KEY=your-secret-key
Key configuration values (see charts/helm-ra-hcp-v0.1.0/values.yaml for the full reference):
| Value | Default | Description |
|---|---|---|
image.repository |
riksarkivet/ra-hcp |
Backend image |
image.tag |
"" (uses appVersion) |
Image tag |
backend.workers |
1 |
Gunicorn worker processes per pod |
replicaCount |
2 |
Number of backend pods |
service.type |
NodePort |
Backend service type |
service.port |
8000 |
Backend service port |
service.nodePort |
30081 |
Backend NodePort |
frontend.enabled |
false |
Enable frontend deployment |
frontend.service.nodePort |
30517 |
Frontend NodePort |
redis.enabled |
false |
Enable Redis sidecar |
opentelemetry.enabled |
false |
Enable OTEL export |
Enable the frontend and Redis:
helm install ra-hcp charts/helm-ra-hcp-v0.1.0 \
--set frontend.enabled=true \
--set redis.enabled=true \
--set env.HCP_DOMAIN=hcp.example.com
Production Architecture¶
A production deployment consists of a frontend, a backend, an optional Redis cache, and the HCP system it manages. A load balancer or ingress controller sits in front and routes traffic.
graph TB
USERS["Users"]
subgraph "Kubernetes Cluster"
ING["Ingress / Load Balancer"]
subgraph "Frontend Pods"
FE1["Frontend #1<br/>SvelteKit + Deno"]
FE2["Frontend #2<br/>SvelteKit + Deno"]
end
subgraph "Backend Pods"
BE1["Backend #1<br/>FastAPI + gunicorn"]
BE2["Backend #2<br/>FastAPI + gunicorn"]
BE3["Backend #3<br/>FastAPI + gunicorn"]
end
REDIS["Redis<br/>Shared cache"]
end
HCP["HCP System<br/>MAPI :9090 + S3 :443"]
USERS --> ING
ING -->|"/ (UI traffic)"| FE1
ING -->|"/ (UI traffic)"| FE2
ING -->|"/api/* (API traffic)"| BE1
ING -->|"/api/* (API traffic)"| BE2
ING -->|"/api/* (API traffic)"| BE3
FE1 & FE2 --> BE1 & BE2 & BE3
BE1 & BE2 & BE3 <--> REDIS
BE1 & BE2 & BE3 --> HCP
| Component | Technology | Port |
|---|---|---|
| Frontend | SvelteKit 2 + Svelte 5, Deno | 5173 (dev), 8000 (container) |
| Backend | FastAPI, Python 3.13+, uv | 8000 |
| Storage adapters | HcpStorage (aioboto3) — pluggable via StorageProtocol | — |
| Cache | Redis 7+ (optional) | 6379 |
| HCP MAPI | Hitachi Content Platform | 9090 |
| S3 endpoint | S3-compatible endpoint (HCP, MinIO, Ceph, AWS) | 443 |
Scaling¶
There are two ways to scale the backend. They are independent and can be combined.
Vertical scaling — gunicorn workers (processes per pod)¶
Each backend pod runs gunicorn with uvicorn worker processes. Gunicorn's primary benefit is reliability — automatic worker restarts, memory leak protection, and graceful reloads — not speed. A single async uvicorn worker already handles hundreds of requests/second. The default is 1 worker per pod with 2 replicas — scaling is done via pod replicas so each gets independent liveness/readiness probes:
Pod (1 replica, 1 worker):
┌──────────────────────────────────────────────┐
│ gunicorn (master) │
│ └─ uvicorn worker 1 ─→ handles requests │
└──────────────────────────────────────────────┘
Gunicorn manages the worker processes (restarts crashed workers, graceful reloads, pre-fork model). Uvicorn handles the async requests inside each worker. This is the recommended production setup from both FastAPI and uvicorn.
The Dockerfile runs:
CMD ["gunicorn", "app.main:app", \
"--worker-class", "uvicorn.workers.UvicornWorker", \
"--bind", "0.0.0.0:8000", \
"--workers", "1", \
"--max-requests", "10000", \
"--max-requests-jitter", "1000", \
"--timeout", "120", \
"--keep-alive", "5", \
"--access-logfile", "-"]
| Flag | Value | Why |
|---|---|---|
--workers |
1 | One worker per pod — scale with replicas in Kubernetes (configurable via Helm) |
--max-requests |
10000 | Recycle workers after 10K requests to prevent memory leaks |
--max-requests-jitter |
1000 | Randomize recycling so workers don't restart simultaneously |
--timeout |
120 | Kill workers that hang for 2 minutes (covers slow HCP responses) |
--keep-alive |
5 | Keep idle HTTP connections open for 5 seconds (reduces handshake overhead) |
--access-logfile |
- |
Log access requests to stdout (picked up by Kubernetes logging) |
The number of workers is configurable via the Helm chart:
# values.yaml
backend:
workers: 1 # 1 worker per pod — scale with replicaCount instead
The Helm deployment template passes this value to gunicorn's --workers flag.
Why 1 worker per pod?
In Kubernetes, each pod has independent liveness and readiness probes. If a pod loses HCP connectivity, Kubernetes removes it from the load balancer while healthy pods keep serving. With multiple workers inside one pod, an unhealthy worker is invisible to probes — the pod stays in rotation even if half its capacity is broken. One worker per pod gives Kubernetes full visibility.
If you run the server outside Kubernetes (bare-metal, Docker Compose), increase --workers to 2 × CPU cores + 1 — there are no probes to leverage, so in-process scaling makes sense. Each worker uses ~50-100 MB RAM.
Horizontal scaling — replica count (pods)¶
Adding replicas creates multiple independent pods, each running their own set of workers. Kubernetes load-balances requests across them:
Default (2 replicas × 1 worker): Scaled (4 replicas × 1 worker):
┌──────────────────────────┐ ┌──────────────────────────────┐
│ Pod 1 │ │ Pod 1 │
│ gunicorn → 1 worker │ │ gunicorn → 1 worker │
├──────────────────────────┤ ├──────────────────────────────┤
│ Pod 2 │ │ Pod 2 │
│ gunicorn → 1 worker │ │ gunicorn → 1 worker │
└──────────────────────────┘ ├──────────────────────────────┤
= 2 processes total, each with │ Pod 3 │
independent health probes │ gunicorn → 1 worker │
├──────────────────────────────┤
│ Pod 4 │
│ gunicorn → 1 worker │
└──────────────────────────────┘
= 4 processes total
Both frontend and backend are stateless and can be horizontally scaled:
- Frontend: Each replica runs SvelteKit with SSR. No shared state — any request can go to any replica. Scale when you have many concurrent browser sessions.
- Backend: Each replica runs FastAPI. All replicas connect to the same Redis and HCP system. Scale when API throughput needs increase or HCP response times are high.
- Redis: Runs as a single instance. All backend replicas share it, so a cache fill from one replica is available to all others. For most deployments, a single Redis instance is sufficient.
# Scale backend to 5 replicas
kubectl scale deployment ra-hcp --replicas=5
# Scale frontend to 3 replicas
kubectl scale deployment ra-hcp-frontend --replicas=3
Or via the Helm chart:
# values.yaml
replicaCount: 4 # 4 pods, each with independent probes
backend:
workers: 1 # 1 worker per pod (default)
Autoscaling is also supported — set autoscaling.enabled: true to let Kubernetes scale replicas based on CPU utilization (see values.yaml for thresholds).
Which scaling approach to use?¶
The default (2 replicas, 1 worker each) is enough for most use cases
A single async uvicorn worker handles hundreds of requests/second. The SDK sends ~2 presign requests/second during bulk transfers. More replicas help with multi-user concurrency and fault tolerance, not single-user transfer speed. Transfer speed is limited by network bandwidth, not the API server.
| Scenario | Recommendation |
|---|---|
| 1-2 users doing bulk transfers | Default (2 replicas, 1 worker) — more than enough |
| Multiple concurrent users or API clients | Horizontal (replicaCount: 4) — each pod gets independent probes |
| High availability requirement | Horizontal (replicaCount: 3+ with podDisruptionBudget) — survives node failures |
| Running outside Kubernetes | Increase --workers to 2 × CPU cores + 1 — no probes, so scale in-process |
SDK bulk_workers vs server workers
The SDK's bulk_workers setting controls how many files are transferred in parallel on your machine. Server workers (gunicorn/replicas) control how many API requests the backend handles in parallel. These are completely different — SDK workers talk directly to HCP S3 for file data. The backend is only involved for presigning URLs (~2 requests/second during bulk transfers). See Performance tuning in the SDK docs for details.
Health Probes¶
The backend exposes health endpoints for Kubernetes liveness and readiness probes:
| Endpoint | Purpose | Checks |
|---|---|---|
GET /liveness |
Liveness probe | Always returns 200 — the process is alive |
GET /readiness |
Readiness probe | Checks HCP MAPI reachability and Redis connectivity |
GET /health |
Legacy | Returns cache status |
The Helm chart configures these probes automatically. A backend pod that can't reach HCP is marked unready and removed from the load balancer until connectivity is restored.
Environment Isolation¶
One Stack Per HCP Domain¶
Each environment (development, acceptance, production) gets its own isolated deployment: its own frontend, backend, Redis, and HCP domain configuration. Environments never share components or cross-connect.
graph TB
subgraph "Development (dev.hcp.example.com)"
direction LR
FE_D["Frontend<br/>1 replica"] --> BE_D["Backend<br/>1 replica"]
BE_D --> RD_D["Redis"]
BE_D --> HCP_D["HCP Dev"]
end
subgraph "Acceptance (acc.hcp.example.com)"
direction LR
FE_A["Frontend<br/>1 replica"] --> BE_A["Backend<br/>2 replicas"]
BE_A --> RD_A["Redis"]
BE_A --> HCP_A["HCP Acc"]
end
subgraph "Production (hcp.example.com)"
direction LR
FE_P["Frontend<br/>2 replicas"] --> BE_P["Backend<br/>5 replicas"]
BE_P --> RD_P["Redis"]
BE_P --> HCP_P["HCP Prod"]
end
style Development fill:#e8f4e8,stroke:#4a4
style Acceptance fill:#fff3e0,stroke:#f90
style Production fill:#fce4ec,stroke:#c33
The 1:1 relationship between a deployment and an HCP domain is enforced by design: the HCP_DOMAIN environment variable is set at startup and determines which HCP system the backend communicates with. There is no runtime domain switching.
Why Isolate?¶
| Concern | How isolation helps |
|---|---|
| Data safety | A dev frontend can never reach prod data — the backend only knows its configured domain |
| Independent lifecycle | Upgrade acceptance while production stays on the current version |
| Independent scaling | Production runs 5 backend replicas; development runs 1 |
| Blast radius | A misconfiguration in dev cannot affect prod |
| Compliance | Clear audit trail — each environment has its own logs, traces, and cache |
Deploying Multiple Environments¶
Use separate Helm releases with environment-specific values files:
# Development — minimal resources, mock-friendly
helm install hcp-dev charts/helm-ra-hcp-v0.1.0 \
-n hcp-dev --create-namespace \
-f values-dev.yaml \
--set env.HCP_DOMAIN=dev.hcp.example.com \
--set secret.API_SECRET_KEY=$(python -c "import secrets; print(secrets.token_urlsafe(64))")
# Acceptance — moderate resources, SSL enabled
helm install hcp-acc charts/helm-ra-hcp-v0.1.0 \
-n hcp-acc --create-namespace \
-f values-acc.yaml \
--set env.HCP_DOMAIN=acc.hcp.example.com \
--set env.HCP_VERIFY_SSL=true \
--set secret.API_SECRET_KEY=$(python -c "import secrets; print(secrets.token_urlsafe(64))")
# Production — full resources, all security features
helm install hcp-prod charts/helm-ra-hcp-v0.1.0 \
-n hcp-prod --create-namespace \
-f values-prod.yaml \
--set env.HCP_DOMAIN=hcp.example.com \
--set env.HCP_VERIFY_SSL=true \
--set env.CORS_ORIGINS=https://hcp-ui.example.com \
--set secret.API_SECRET_KEY=$(python -c "import secrets; print(secrets.token_urlsafe(64))")
Use Kubernetes namespaces
Deploy each environment to its own namespace (hcp-dev, hcp-acc, hcp-prod). This provides network isolation, independent RBAC, and clean resource accounting.
Example: Production values file¶
# values-prod.yaml
replicaCount: 5
backend:
workers: 1 # 5 pods × 1 worker = 5 processes, each with independent probes
frontend:
enabled: true
replicaCount: 2
redis:
enabled: true
ingress:
enabled: true
className: nginx
hosts:
- host: hcp-ui.example.com
paths:
- path: /api
pathType: Prefix
backend: api
- path: /
pathType: Prefix
backend: frontend
tls:
- secretName: hcp-tls
hosts:
- hcp-ui.example.com
env:
HCP_DOMAIN: hcp.example.com
HCP_VERIFY_SSL: "true"
CORS_ORIGINS: "https://hcp-ui.example.com"
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "1000m"
Security Hardening Checklist¶
Before going to production, verify these settings:
| Item | What to check | Risk if skipped |
|---|---|---|
API_SECRET_KEY |
Set to a unique, random 64+ character value per environment | JWTs can be forged — full admin access |
HCP_VERIFY_SSL |
Set to true in production |
Man-in-the-middle attacks on HCP communication |
CORS_ORIGINS |
Set to your specific frontend URL(s) | Cross-origin requests from malicious sites |
| Container security | Verify runAsNonRoot: true, readOnlyRootFilesystem: true, drop: ALL |
Container escape or privilege escalation |
| Redis network | Redis should only be accessible from backend pods (ClusterIP service) | Cache data exposure |
| Kubernetes namespace | Each environment in its own namespace with network policies | Cross-environment access |
| Ingress TLS | Terminate TLS at the ingress with a valid certificate | Traffic interception |
| HCP credentials | Use dedicated service accounts, not personal admin accounts | Over-privileged access, no audit trail |
The default API_SECRET_KEY is change-me-in-production
This is intentionally insecure for local development. You must change it in any non-local deployment. Generate a secure key with:
python -c "import secrets; print(secrets.token_urlsafe(64))"