The VM is the pod.
The cloud is the control plane.
Skiff gives teams Kubernetes-class operational-leverage without Kubernetes-class cost or complexity. Skiff allows you to define and compile your service definition into cloud-native primitives, run managed ops like canary rollouts or database restores, debug with built-in observability, and act as a harness for agents for safe investigations and repairs. No need for managed control planes, overlay networks, hidden controller state machines, Kubernetes/cloud impedance mismatch, or thousands of lines of YAML.
Terraform describes shape. Kubernetes reconciles clusters. Production needs journeys.
Real operations are not just resources. They are restores, key rotations, canaries, failovers, migrations, approvals, compensations, and evidence. Skiff gives those journeys a first-class operational substrate.
Terraform is stable state
Excellent for declaring what should exist. Weak for live, multi-step operations with rollback gates and partial failure.
- ✓Great for infrastructure shape
- ×Not a runtime operations engine
- ×Deploys become plan/apply choreography
Kubernetes is a parallel cloud
Powerful, but it recreates scheduling, networking, identity, secrets, and health as cluster primitives.
- ✓Great ecosystem and abstractions
- ×Cluster operations become the job
- ×Operators hide procedural complexity
Skiff is explicit operations
Compile simple specs to cloud primitives, store signed desired state in object storage, and run operational sagas on demand.
- ✓Object-storage-backed operating ledger
- ✓Object storage state with CAS docs
- ✓Runbooks as typed, resumable graphs
One bucket, cloud IAM, and a tiny runner.
Skiff deploys through the same path it recovers from: write durable object state first, then move visible cloud primitives, while skiffd remains a rebuildable facade over the ledger.
Durable operating ledger
Release manifests, operation intents, plans, SBOMs, provenance, audit entries, and events are immutable objects in the state bucket.
CAS coordination
Service state, operation state, leases, and member state live in narrow control documents updated with compare-and-swap semantics.
Visible cloud primitives
ASGs, target groups, IAM roles, launch templates, logs, and provider IDs stay visible instead of being hidden behind cluster abstractions.
Deploy payments-api
A normal service deploy becomes a typed, auditable operation with traceable release evidence.
Runbooks as typed, resumable operation graphs.
Database restores, key rotations, canary deploys, regional failovers, migrations, and repairs are not hidden controllers. They are explicit sagas: planned, approved, executed, paused, resumed, compensated, and audited.
Canary deploy
Start at 5%, bake, evaluate health and metrics, advance, pause, or compensate by rolling back.
Database restore
Restore to a new database, smoke test, approval gate, secret cutover, service rollout, old DB retention.
Key rotation
Create new versions, canary consumers, promote aliases, roll services, delay destructive cleanup.
Regional failover
Verify replica lag, freeze writes, promote, shift traffic, verify, and clearly mark irreversible steps.
Kubernetes cutover
Deploy Skiff shadow service, shift traffic by weight, compare metrics, retire the old service safely.
Incident repair
Collect evidence, recommend safe actions, run reversible remediation, append every event.
Humans get clarity. Agents get structure.
Every Skiff command has deterministic JSON output, explicit risk, recommended next actions, idempotency keys, failure taxonomy, and safety classification. An agent armed with the CLI can diagnose, deploy, repair, roll back, and resume without scraping logs or guessing state.
{
"ok": false,
"code": "CANARY_FAILED",
"summary": "new release failed readiness",
"facts": [
"rollout paused at 10%",
"new targets return 500 on /healthz",
"previous stable release is healthy"
],
"recommended_actions": [
{
"id": "rollback",
"command": "skiff saga start rollback --service payments-api --to previous-stable --yes --format json",
"mutating": true,
"safety": "reversible",
"confidence": 0.91
}
]
}
A beautiful cockpit for deployments, sagas, logs, metrics, and recovery.
The TUI is a frontend over the same deterministic API and object-state model. There is no separate magic path for humans.
payments-api
release 2026.05.16.1 · 6/6 healthy · p95 91ms · error 0.2% · cpu 48%
Saga: restore payments-db
Facts: restored DB is available, shadow API passed, current DB snapshot exists. Action: approve cutover or reject saga.
Do not make users become security experts to get secure operations.
Skiff's defaults are intentionally conservative: signed releases, digest-pinned artifacts, least-privilege IAM, encrypted state, conditional writes, no SSH ingress, managed sessions, KMS, secret references, and explicit approval for risky sagas.
Start small. Do not rewrite your world.
Skiff should be easy to try from AWS, Terraform, or Kubernetes. The happy path is direct apply, but Terraform generation and Kubernetes migration are first-class bridges.
Direct AWS mode
Skiff CLI or stateless skiffd writes object state and calls AWS APIs. Fastest path, no Terraform state, object-state native.
- ✓Best default
- ✓Disaster-recovery friendly
Terraform bridge
Generate or adopt Terraform for stable infrastructure shape, then let Skiff own release pointers, rollouts, sagas, and diagnostics.
- ✓Enterprise review friendly
- ✓No deploy-by-plan/apply requirement
Kubernetes migration
Import Deployment/Service/Ingress, deploy shadow Skiff services, then cut traffic over through a weighted migration saga.
- ✓No cliff jump
- ✓Unsupported features are explicit
Recipes for the operations teams actually run.
Skiff should ship with opinionated, understandable recipes. Users can inspect the plan, run it, pause it, approve it, or let agents execute low-risk paths.
API server + managed database
Deploy an API, create a managed database, wire secrets, emit logs/metrics, and get default restore, rotate, and canary sagas.
$ skiff init stack api-db payments $ skiff deploy $ skiff restore database payments-db --to latest
Multi-region API + regional database
Run services in two regions, maintain database replication, test failover, and promote through explicit high-risk sagas.
$ skiff failover stack payments --to us-east-1 ! replica promotion is irreversible after new writes
Queue worker with autoscaling
Scale worker VMs from queue depth or age, keep logs and metrics normalized, and debug failures without clusters.
$ skiff init worker invoice-worker $ skiff metrics invoice-worker queue-lag
Kubernetes service migration
Import, deploy shadow, compare health, shift traffic, and decommission the old service only after evidence says it is safe.
$ skiff import kube ./k8s --out skiff.yaml $ skiff deploy --shadow $ skiff saga start traffic-cutover --steps 5,25,50,100
Golang core, provider plugins, saga steps, and object-state discipline.
Skiff is composable without becoming an operator framework. Plugins register provider capabilities, runtime addons, saga step kinds, diagnostics, and recipes.
skiff/
cmd/
skiff/ # CLI/TUI
skiffd/ # stateless API server
skiff-runner/ # VM runner
skiff-worker/ # optional saga/index worker
internal/
compiler/ ir/ provider/aws/
state/ objstore/ release/
saga/ saga/steps/
doctor/ policy/ plugins/
tui/ observability/
pkg/
spec/ pluginapi/ sagaapi/ sdk/
examples/
api-db/ worker/ mtls/
multiregion-db/
type Step interface {
Kind() string
Plan(ctx context.Context, req StepRequest) (*StepPlan, error)
Run(ctx context.Context, req StepRequest) (*StepResult, error)
Resume(ctx context.Context, req StepRequest) (*StepResult, error)
Compensate(ctx context.Context, req StepRequest, result StepResult) (*StepResult, error)
Doctor(ctx context.Context, req StepRequest) ([]Finding, error)
}
Build the core until the five commands feel magical.
The first version should be narrow and excellent: AWS, stateless services, object state, signed releases, runner, logs, doctor, rollback, and sagas.
Stateless AWS service
Service spec, compiler IR, S3 state, signed release ledger, ASG/ALB/IAM, runner, CloudWatch logs, status and rollback.
Doctor and sagas
Structured diagnostics, canary saga, rollback saga, operation leases, append-only events, agent-safe JSON.
Managed dependencies
API + managed database recipe, restore saga, secret rotation saga, database smoke tests, cost and shape advisor.
Adoption bridges
Terraform generate/adopt, Kubernetes import, shadow deploy, weighted traffic cutover, TUI, hot skiffd indexes.
Plugins and multi-region
mTLS plugin, provider conformance, stateful recipes, regional failover sagas, GCP/Azure provider work.