Ship a release
Compile spec, sign release, canary traffic, watch health, promote or pause.
Skiff gives teams one operating path for stateless and stateful services: least-privilege permissions, signed releases, canary deploys, runbooks as code, agent-readable command output, and native cloud primitives without a cluster control plane.
Kubernetes can deliver good operational patterns, but teams often pay for them with controllers, CRDs, clusters, custom policy layers, observability wiring, bespoke rollout scripts, and enough YAML to bury the customer journey.
The first win is safer defaults.Secure bootstrap, signing, identity, logs, and deploy shape are ready before the first service ships.
Deploys become a known procedure.The operator sees the release, traffic state, health gates, and the business-safe next step instead of bouncing across unrelated consoles.
Daily operations are designed before the incident.Every long-running operation is resumable and every repair has explicit risk and reversibility.
Changes are easy to review.Mutating production operations record who acted, what changed, where it ran, and how risky it was.
Stateless web APIs, workers, queues, databases, and stateful members need different runbooks. They still need the same operational contract: secure identity, safe deploys, health checks, scoped recovery commands, and recorded changes.
Identity starts narrow.Workloads get only the cloud permissions they need. Operators get scoped deploy and recovery permissions, not a blanket cluster-admin escape hatch.
Deploy paths match the workload.A web API can canary through target groups while a stateful member uses explicit approval and preflight checks.
Observability is attached to the journey.Status, logs, health, backups, and cloud resources show up in the same operational context.
Recovery is an executable runbook.Rollback, failover, drain, restore, and resume are typed actions with risk and reversibility.
Skiff puts the production checklist into the workflow: least privilege, release signatures, immutable history, secret references, approval gates, and audit records. Teams do not have to remember the secure version of the command under pressure.
Access starts from the operation.The service gets only what it needs, and high-risk actions require explicit approval.
The release is immutable, not a pointer.Skiff verifies signed manifests and runtime manifests before rollout.
Runtime safety does not rely on memory.The runner can read durable state directly and verify before starting the workload.
History survives handoffs.Humans and agents get the same traceable event stream when continuing an operation.
During an outage, a wiki runbook is too easy to misread or skip. In Skiff, canaries, drains, restores, failovers, rotations, and repairs are explicit sagas with typed steps, stored progress, compensation where possible, and clear events.
Canary deploys become an operational contract.Skiff advances only when health, logs, and SLO checks support promotion.
Restores follow a recorded plan.The runbook proves backup freshness, isolates blast radius, and records the cutover plan.
Secret rotation has a safe middle.Stage new references, verify workloads, then revoke old credentials after consumers move.
Failover is explicit and reviewable.Skiff separates plan, approval, route changes, validation, and compensation.
The deploy journey connects release signing, rollout traffic, target health, SLO signals, logs, and recorded changes. The animation below shows traffic moving only as checks pass.
Before traffic moves, the release is checked.The runner verifies signatures and digests before serving, and the operation starts with a trace ID.
Small traffic proves the runtime path.Skiff watches target health, logs, and SLO signals before increasing exposure.
Promotion is gated by live signals.Traffic advances when signals are good, pauses when they are ambiguous, and recommends repair when they are unsafe.
Completion records the change.The service control updates after durable state, and the operator can explain exactly what changed.
Skiff keeps operators from assembling context from scratch. Service status is tied to the operation, underlying cloud resources, trace ID, target health, logs, findings, and recommended commands. The same context is available as JSON for agents.
Detect starts from the service journey.The operator sees rollout state and customer traffic before chasing raw telemetry.
Correlation is built into the command output.Trace IDs connect target health, logs, events, and cloud resource IDs.
Recommendations are structured.Skiff separates facts from hypotheses and marks actions as no, low, medium, or high risk.
Resume is a first-class operation.After the fix, the same operation can continue without reconstructing state from memory.
Every command supports --format=json. Skiff packages the facts, trace IDs,
operation IDs, recent events, risk labels, and approval requirements an agent needs to help
without becoming an unreviewable controller.
Agents do not scrape prose.JSON mode is a stable interface for status, doctor output, recommendations, and errors.
Skiff manages the context packet.The agent gets the service, operation, trace, cloud resources, recent events, and next commands together.
Risk is explicit.Commands are classified before they run, including whether they mutate state and how reversible they are.
Escalation is part of the flow.Agent escalations to humans and two-party authorization are built into high-risk operations.
Skiff is AWS-first and works against existing cloud accounts. Import known cloud shape, keep native cloud resources in the model, and add paved operational journeys one service at a time.
Discovery makes the cloud shape legible.Skiff does not hide existing primitives or require teams to pretend the cloud is a cluster.
Bootstrap installs secure defaults.The environment gets state, signing, IAM, logs, and context without exposing low-level IDs on the happy path.
The first service proves the path.Operators get canary deploys, release verification, status, logs, doctor output, and a CLI fallback.
Expansion is additive.Each new service adds typed operations instead of another pile of bespoke YAML.
Skiff is designed around operational jobs: ship safely, respond to degraded service, recover data, rotate credentials, and hand work to an agent or another human with JSON context, risk labels, and enough history to continue.
Compile spec, sign release, canary traffic, watch health, promote or pause.
Pull JSON context, inspect logs, classify risk, escalate or run bounded repair.
Verify backup, isolate target, approve cutover, validate health, record the result.
Stage new reference, roll workloads, confirm consumers, revoke old access.
Shipping stays in one path.The operator sees release, rollout state, traffic, health, logs, and the next safe action together.
Repair begins with observed facts.Doctor output recommends commands, labels risk, and asks for human approval when an agent should not act alone.
State recovery gets first-class treatment.Restore work includes backup freshness, risk classification, cutover, validation, and traceable results.
Credential rotation is deliberate.Skiff stages the change, verifies workloads, revokes old access, and records what changed.
The operator-facing promise rests on durable object state, a stateless facade, a direct CLI fallback, immutable history, CAS controls, and typed sagas.
skiff deploy payments-api --canary -> write operation intent -> create signed release manifest -> CAS service control -> watch target health -> append audit event skiff --direct status payments-api -> read object state directly -> rebuild enough view to recover
Durable state comes first.Mutating operations write object storage before updating in-memory views.
The facade can fail without taking truth with it.skiffd powers normal UX, but the CLI can still read object state directly.
The VM is the workload boundary.Runners verify signed releases and report state transitions without relying on a cluster control plane.
Audit is part of the contract.Every mutating production operation is traceable, resumable when long-running, and explicit about risk.