Workspaces
A workspace is one isolated runtime for an agent. Each workspace gets its own Kubernetes pod with its own sandbox filesystem, its own capability token, and its own audit trail. Tasks you submit run inside a workspace; every audit event and every running agent points back at one.
A workspace also outlives any single task. The pod stays warm across multiple submissions, a session can span tasks, and stopping a workspace captures its filesystem to a snapshot you can resume later. The workspace is the unit of continuity for your agent’s work, not just the unit of isolation.
Anatomy
Section titled “Anatomy”The workspace is an isolated pod running the agent — a Python
service that drives the model dispatch + bash tool loop with
/workspace as its working directory. Everything outside that
loop — policy enforcement, credential injection, TLS
interception, DNS, audit emission — runs off-pod, on a
separate proxy. The agent has no policy code, no secrets, and no
direct network egress.
A few things worth knowing about how that boundary works:
- Interception is transparent. The agent makes plain HTTPS
calls —
curl,httpx, whatever — without any client configuration. Every outbound connection is redirected to the proxy under the hood; the agent code doesn’t have to know the proxy exists. - Each connection carries a per-task capability token. The control plane mints a short-lived JWT when a task is submitted; it rides on every outbound connection while that task runs. The proxy uses it to identify which workspace is calling, which policy applies, and which credentials it’s allowed to inject. The agent treats the token as an opaque string — it never sees, decodes, or transmits it directly in its requests.
- The proxy terminates TLS using a platform CA the agent trusts. That gives the proxy full L7 visibility — method, path, headers, body — so it can evaluate policy, inject the real upstream credentials (Authorization headers, OAuth tokens, etc.) at request time, and audit what actually went on the wire. The agent never holds the upstream’s real credentials.
- DNS denials are handled by routing, not by DNS errors. An
allowed domain resolves to its real IP; a denied domain
resolves to a synthetic IP, so the agent’s TCP connect succeeds
and the request reaches the HTTP gate. The agent sees a clean
HTTP 403 with the deny reason attached — not a confusing DNS
failure that an LLM driving
curlmight try to “fix” by guessing alternate hostnames.
Notable paths inside /workspace:
/workspace/packs/<pack-name>/— read-only OCI image volumes of the content packs attached to the role. One image volume per pack, mounted at pod create./workspace/uploads/<task_id>/— files attached to a task at submission time. Customer-uploaded files originate at the control plane and ride the multipart task delivery into the pod at this fixed path./workspace/.session_messages.jsonl— incremental transcript of the conversation. The agent service writes it as messages flow, restores from it on crash, and the control plane captures it into the snapshot when the workspace stops./workspace/.session_handoff.md— a summary the model writes via bash before wrapping up a session (prompted by the wrap-up warning), so a follow-on session in the same workspace can pick up context. Surfaced inline on the session API alongside the transcript./workspace/.agent.log— the agent service’s log file. Surfaced inline on the session API (tail-truncated for large logs) so operators can inspect what the service did without a separate kubectl path./workspace/.progress/— progress files the model writes via bash during long tasks: one file per discrete step plus acurrent.jsonfor the live “what’s happening right now” status. The agent service tails this directory and emitsstep_statusevents (per step) andprogress_statusevents (forcurrent.json) as files change; the console’s progress tracker renders both./workspace/scratch/— the model’s working directory. The agent service creates it at startup, every bash command runs with this ascwd, and the agent service diffs the directory pre- and post-task so thetask_endevent’sfiles_created/files_modifiedfields reflect what the task actually produced. The console’s session view surfaces these files inline.
Workspace fields
Section titled “Workspace fields”The customer-visible fields on a workspace row:
| Field | Purpose |
|---|---|
id | Stable workspace identifier. Used in URLs, audit rows, CLI commands. |
org_id | Tenant scope. |
role_id | The agent role the workspace was created under. Determines provider, model, capabilities, pool policy. |
environment | Free-form label inherited from the role at create time, surfaced in audit rows. |
created_by_user_id | The user who created the workspace, or system for API-key-driven creates. Drives :own permission narrowing and audit attribution. |
status | Closed enum tracking the workspace’s lifecycle phase — see Lifecycle. |
created_at, updated_at | Bookkeeping. |
archived_at | Soft-delete tombstone — see Archive. |
snapshot_ref | Opaque reference to the most recent saved snapshot. Empty until the first stop — see Snapshots. |
last_snapshot_error | Populated when a snapshot save failed against a pod that had already gone away (eviction, OOM). Cleared on the next successful save. |
budget_snapshot | The proxy’s final budget-counter state at the most recent stop, persisted so resume can hand it back to a freshly activated proxy — see Budget counters. |
parent_workspace_id, fork_source_snapshot_ref, fork_point_task_count | Fork lineage, set only on workspaces created via fork — see Fork. |
pack_status | Derived runtime field: current (pod’s mounted packs match the role’s current pack set), stale (role packs changed since the pod was created — restart to apply), or unknown (no live pod). |
Task fields
Section titled “Task fields”A task row is smaller — most of the rich state lives on its paired activity events rather than on the row itself:
| Field | Purpose |
|---|---|
id | Stable task identifier. |
workspace_id, org_id | Which workspace the task ran in, and the owning org. |
status | Closed enum tracking the task’s lifecycle phase — see Tasks. |
description | The prompt the user submitted with the task. |
reason | Terminal reason on failed / cancelled / delivery_failed (e.g. timeout, manual, workspace_stop). Empty on the happy path. |
duration_seconds | Computed at read time from completed_at - created_at. Not persisted as a column. |
created_at, completed_at | Bookkeeping. completed_at is set on the terminal transition. |
Iteration count, token usage, files created/modified, and the
agent’s last response don’t live on the task row — they ride
the paired task_end activity event detail and are fetched
lazily when an operator expands a task in the console.
Lifecycle
Section titled “Lifecycle”Workspace status is a closed enum. Transitions are validated server-side; an illegal transition fails the request.
| Status | Meaning |
|---|---|
provisioning | A pod is being claimed (from the warm pool) or cold-started. The workspace exists but isn’t ready for tasks. |
idle | Pod is up and the external proxy is activated. Ready for tasks. |
busy | A task is currently executing. |
stopping | Stop is in flight: any running task is cancelled, the agent flushes its write queues, a snapshot is captured. |
stopped | Pod is destroyed; the filesystem lives on as a snapshot. Submitting another task — or hitting Resume — revives it. |
failed | Provisioning, snapshot save, or proxy activation hit an unrecoverable error. The row stays for inspection. |
Warm pool vs cold start
Section titled “Warm pool vs cold start”When a workspace enters provisioning, the platform first tries
to claim a pre-warmed pod from the role’s pool. A pool pod is
a workspace pod that’s already been scheduled and started on a
node with the role’s configured packs, sitting idle with no
activated proxy state. Claiming one is sub-second — the pod is
already up, the platform just hands it ownership of the new
workspace and activates the proxy.
If no pool pod is available — pool empty, or the role has no pool policy configured — the workspace cold-starts: the platform creates a new pod from scratch. That’s tens of seconds (image pull, scheduler, init container, agent boot) but works the same way once it’s running.
Whether a role keeps a warm pool, and how big, is part of the role’s pool policy. Workspaces themselves don’t choose; they take whatever’s available.
Surprising transitions
Section titled “Surprising transitions”Three transitions are worth calling out because they can surprise you:
busy → provisioningis the force-restart path. When a role’s configuration changes (packs swapped, capabilities edited, model changed) and you restart an active workspace, it transitions through provisioning again so the new pod can come up under the new config.stopping → idleis the Stop-aborted path. If the snapshot save fails but the pod is still alive, Stop bails back to idle rather than wedging the workspace in stopping. You can retry.failedonly transitions back viastopping. A failed workspace can’t be resumed directly; you stop it first (which cleans up partial state), then create a fresh workspace.
Archive
Section titled “Archive”A workspace has an archived_at flag that hides it from default
listings without deleting it.
- Archive is only valid when the workspace is
stoppedorfailed. Active states (provisioning/busy/idle/stopping) return409 invalid_state— stop the workspace first. - Unarchive is blocked if the workspace’s role is archived. The chain has to come back top-down: unarchive the provider, then the role, then the workspace. The reverse direction also cascades — archiving a role archives every stopped workspace under it in the same transaction, and archiving a provider chains further into archiving its roles + their stopped workspaces.
- Archived workspaces show up with
--include-archivedonevershell ps, or via the Archived filter on the console Workspaces page.
Timers
Section titled “Timers”Each role declares two timers the platform enforces:
- Idle timeout — how long a workspace can sit with no task before being stopped automatically (snapshot + pod teardown). Keeps resources from being pinned indefinitely.
- Task timeout — how long any single task may run. The control plane warns the agent at ~80% of the timeout so the agent can wrap up gracefully; at 100% the task is cancelled.
Both re-arm on the relevant state changes, and both are tuned per role.
A task is one unit of work submitted to a workspace. Two paths create one:
POST /v1/tasksatomically creates a workspace and submits its first task. This is whatevershell run <role> "<description>"does.POST /v1/workspaces/{id}/taskssubmits a task to an existing workspace. The workspace doesn’t have to be idle — submitting to a stopped workspace transparently resumes it first.
Task status is a closed enum, also state-machine validated:
| Status | Meaning |
|---|---|
submitted | Row created; CP is about to hand the task to the agent. |
delivered | Agent acknowledged receipt. |
running | Agent’s task loop is executing. |
compacting | The agent is summarising older conversation turns to make room and will return to running. |
completed | Terminal — happy path. |
failed | Terminal — agent or provider error. |
cancelled | Terminal — task timeout, manual cancel, or workspace stop. |
delivery_failed | Terminal — control plane couldn’t reach the agent at all. |
Two close events per task
Section titled “Two close events per task”Every task in the audit stream ends with at least one terminal event. On the happy path the activity feed shows two:
task_end— emitted by the agent inside the pod when its task loop finishes. Carries iteration count, the wrap-up reason if any, files created, and a token-usage rollup.task_completed— emitted by the control plane when it acknowledges the close (or has to force one).
The console suppresses task_completed on the happy path so the
activity feed doesn’t show a duplicate close. But both exist
because the agent may never get to emit task_end: if a task
hits its timeout, or the workspace is stopped mid-task, or the
pod is evicted, the agent doesn’t report back. In those cases
the control plane emits task_completed itself with the
appropriate cancellation reason — so every task is guaranteed at
least one terminal event regardless of how it ended.
Sessions, iterations, compaction
Section titled “Sessions, iterations, compaction”These are agent-internal events that surface on the activity stream alongside the task lifecycle. Three things worth knowing:
- Sessions are bigger than tasks. A session is the agent’s ongoing conversation with the model. One session can span multiple tasks — submitting a follow-up task to an idle workspace continues the existing session rather than starting a new one.
- A session ends only via wrap-up. When a limit approaches —
a budget counter approaching its cap, the task duration
nearing its timeout, or the context window filling after the
role’s
max_continuationsof compaction have been exhausted (see below) — the agent gets a wrap-up warning, transitions towrapping_up, and gets a few more turns to write a.session_handoff.mdand finish cleanly. When the agent loop exits inwrapping_up, the session transitions tocompleted. A task ending on its own doesn’t end the session; the session keeps going until one of the wrap-up triggers fires. - The next task after a completed session starts fresh, but
reads the handoff. A new session is created and the agent’s
system prompt picks up
.session_handoff.mdso the model can continue where the previous session left off. - You can force a reset by submitting a task with
reset_session=true. That archives the current session’s transcript, deletes the handoff file, and starts a clean session with no inherited context. Use it when you want a true blank slate rather than a handoff continuation. - Iterations are smaller than tasks. Each model turn — one
prompt sent, one response received — is an iteration.
iteration_endcarries a token-usage map (input_tokens,output_tokens,cache_read_input_tokens,cache_creation_input_tokens,thinking_output_tokens); the console renders these as chips. - Compaction keeps the session going. When the conversation
approaches the role’s
max_context_tokens, the agent summarises older turns and continues.compaction_startandcompaction_endbracket the operation;continuationon the detail tracks which round it is. The role’smax_continuationscaps how many compaction rounds a single task may chain before the agent wraps up instead.
Session lifecycle events: session_start, session_wrap_up
(with a closed-enum reason: context, budget, or task),
and session_end. All ride as category=activity audit rows
and surface on GET /v1/workspaces/{id}/activity (and the
evershell logs stream).
Snapshots
Section titled “Snapshots”When a workspace stops, the control plane captures /workspace
as a tar.gz archive and stores it in the platform’s snapshot
store. The agent’s JSONL transcript, the handoff summary it
wrote before wrapping up, anything else under /workspace — all
of it rides in the snapshot. /workspace/packs/ is excluded:
packs are independently mounted from their image-volume sources,
so resume re-mounts them from the registry rather than baking
stale copies into the archive.
Workspace state is captured at multiple points so it survives both planned shutdowns and unplanned pod loss:
- Every Stop snapshots before tearing the pod down — whether the stop was explicit (operator stop, agent shutdown) or implicit (idle-timer or task-timeout fired).
- Restart snapshots before destroying the old pod, then resume restores into the new one — the operator just sees “restarting” continuously, but underneath it’s a snapshot round-trip.
- Fork snapshots when its parent is currently idle and has a live pod; otherwise it reuses the parent’s existing snapshot.
- Periodic snapshots fire on a timer while a task is active, so a long-running task doesn’t lose progress if something hits the pod between explicit save points.
- Pod-termination watch — when Kubernetes sets
DeletionTimestampon the pod (node drain, scale-down, manualkubectl delete), the platform races a snapshot against the grace period before SIGKILL lands, so state isn’t lost to an involuntary eviction.
Snapshot save is the step everything else depends on. Every
path that captures one guards against failure: if the pod is
still alive, the operation bails back to the previous status so
you can retry; if the pod is already gone (eviction, OOM), the
workspace settles into stopped with a last_snapshot_error
recorded so the UI can flag “stopped, snapshot lost” and the
operator can decide whether to resume against the older snapshot
or start empty.
Resume claims a fresh pod (pool or cold-start), extracts the
snapshot into /workspace, reactivates the external proxy with
the role’s current policy, and transitions back to idle.
Budget counters
Section titled “Budget counters”Capability budget counters are tracked and enforced at the external proxy, not inside the workspace pod. The customer- relevant link between budgets and workspaces is recovery: when a workspace stops, the proxy reports its final counter state, and that state is persisted onto the workspace row. On resume, those counters are sent back to the freshly reactivated proxy so in-flight reservations and the current TTL windows pick up where they left off. From a workspace’s perspective, the counters are “persisted with the rest of my state.” The actual semantics of counters — what they meter, how reservations work, how floors fire — belong to capabilities; see the caps.yaml reference.
A fork creates a new workspace from a parent’s snapshot, with the parent’s conversation rewound to a specific task boundary. The child is fully independent: its own pod, its own budget counters (not inherited), its own status, its own audit trail. The parent is untouched.
The parent must be idle, stopped, or failed to be forkable.
A busy, provisioning, or stopping parent has no consistent
snapshot to fork from. When the parent is idle, fork drains its
write queue and snapshots fresh; when the parent is stopped or
failed, fork uses the parent’s existing snapshot.
Fork records three things on the child:
- The parent workspace id, so the child knows where it came from.
- The snapshot reference used at fork time — for display and audit, not a live pointer. The parent’s future stops may overwrite that blob; the child holds its own independent extraction from when fork ran.
- The fork-point task count, recording how many of the parent’s tasks were included before the conversation was truncated.
Ownership flows to the forker, not the parent’s creator. A fork is a fresh working session for whoever ran it; that’s who shows up on per-user visibility filters and audit attribution.
In the console you can fork from two places:
- The task list on a workspace’s detail page — click Fork on any task row when the workspace is in a forkable state.
- The activity panel — the Fork button shows on
task_endrows for completed tasks, gated on the same forkable-status rule.
Live Agents
Section titled “Live Agents”Live Agents — opened via the Watch Live button on the Workspaces page — is the console’s real-time view of the org’s running agents. It’s the single screen for “what are my agents doing right now, and what are they reaching out to.”
Layout
Section titled “Layout”- Left panel: a compact list of every workspace, grouped by role. A filter input narrows the list and an Active/All toggle decides whether stopped workspaces are included or hidden. Each group header shows the role name, the count of workspaces in the group, and a small network icon that jumps to that role’s Role Topology view. Each row shows the workspace ID and a status badge; clicking a row focuses that workspace on the canvas (Cmd/Ctrl-click toggles multi-select for focusing several at once).
- Right canvas: a graph view of the same workspaces and what they’re connected to, with pan / zoom / fullscreen controls and a Graph search input that narrows the canvas to nodes and edges matching the query (role names, domains, capabilities). Clicking an agent node focuses it the same way the left panel does.
What’s on the graph
Section titled “What’s on the graph”The canvas has two kinds of nodes:
- Agent nodes — one per running workspace. Each renders the role, the workspace’s current status, a visual cue for the agent’s current phase (thinking, responding, executing a bash command), and a one-line summary of the active task. Right-clicking an agent node opens a context menu with the actions available against that workspace, gated on the caller’s scopes: Send task, Cancel task (when the workspace is busy), View details (workspace detail page), Edit role, View role topology (jumps to the topology view focused on the role), and Stop workspace. Right-clicking empty canvas instead offers New task and View all workspaces.
- API-service nodes — one per upstream domain the workspace’s capabilities authorise traffic to, plus any domain an agent tried to reach and got denied. Denied nodes are visually distinct.
The edges between them are the policy:
- Capability edges — agent → API. Each edge carries the capability name, the allowed HTTP methods and path patterns, the credentials and secrets the proxy injects on the request, and the budget counters that meter calls along this edge. The edge is, in effect, “what is this agent allowed to do against this service, and how is it metered.”
- Deny edges — render separately when an agent attempts a domain or method the policy doesn’t admit.
When the proxy decides on a request, the matching edge animates the request flow in real time — allow vs deny show as different animations. So you don’t just see the static allowed-paths graph; you see traffic moving across it as it happens.
Stats and scope
Section titled “Stats and scope”Every edge accumulates counters: iterations, bash calls and bash errors on the agent side, HTTP allowed and denied on the outbound side. A stats scope toggle (task / session / workspace) controls the window the counters aggregate over — “just the current task,” “everything in the current session,” or “everything this workspace has ever done.” Stats backfill from the workspace’s activity and audit history on open and stay updated live from the same SSE feed that drives the animations.
Live data sources
Section titled “Live data sources”The view subscribes to a multiplexed SSE feed of activity
events and policy_decision audit rows. A connection-status
indicator shows whether the feed is live; if it disconnects, the
view marks itself offline rather than showing stale numbers as
fresh. An event-type filter lets you narrow which event kinds
drive the visualisation (handy when a workspace is chatty and
the animations get noisy).
The view is for the live picture — currently active workspaces and what they’re touching. For after-the-fact inspection of any single workspace’s history, use that workspace’s detail page (activity panel + audit tab).