Workspaces

A workspace is one isolated runtime for an agent. Each workspace gets its own Kubernetes pod with its own sandbox filesystem, its own capability token, and its own audit trail. Tasks you submit run inside a workspace; every audit event and every running agent points back at one.

A workspace also outlives any single task. The pod stays warm across multiple submissions, a session can span tasks, and stopping a workspace captures its filesystem to a snapshot you can resume later. The workspace is the unit of continuity for your agent’s work, not just the unit of isolation.

Anatomy

The workspace is an isolated pod running the agent — a Python service that drives the model dispatch + bash tool loop with /workspace as its working directory. Everything outside that loop — policy enforcement, credential injection, TLS interception, DNS, audit emission — runs off-pod, on a separate proxy. The agent has no policy code, no secrets, and no direct network egress.

A few things worth knowing about how that boundary works:

Interception is transparent. The agent makes plain HTTPS calls — curl, httpx, whatever — without any client configuration. Every outbound connection is redirected to the proxy under the hood; the agent code doesn’t have to know the proxy exists.
Each connection carries a per-task capability token. The control plane mints a short-lived JWT when a task is submitted; it rides on every outbound connection while that task runs. The proxy uses it to identify which workspace is calling, which policy applies, and which credentials it’s allowed to inject. The agent treats the token as an opaque string — it never sees, decodes, or transmits it directly in its requests.
The proxy terminates TLS using a platform CA the agent trusts. That gives the proxy full L7 visibility — method, path, headers, body — so it can evaluate policy, inject the real upstream credentials (Authorization headers, OAuth tokens, etc.) at request time, and audit what actually went on the wire. The agent never holds the upstream’s real credentials.
DNS denials are handled by routing, not by DNS errors. An allowed domain resolves to its real IP; a denied domain resolves to a synthetic IP, so the agent’s TCP connect succeeds and the request reaches the HTTP gate. The agent sees a clean HTTP 403 with the deny reason attached — not a confusing DNS failure that an LLM driving curl might try to “fix” by guessing alternate hostnames.

Notable paths inside /workspace:

/workspace/packs/<pack-name>/ — read-only OCI image volumes of the content packs attached to the role. One image volume per pack, mounted at pod create.
/workspace/uploads/<task_id>/ — files attached to a task at submission time. Customer-uploaded files originate at the control plane and ride the multipart task delivery into the pod at this fixed path.
/workspace/.session_messages.jsonl — incremental transcript of the conversation. The agent service writes it as messages flow, restores from it on crash, and the control plane captures it into the snapshot when the workspace stops.
/workspace/.session_handoff.md — a summary the model writes via bash before wrapping up a session (prompted by the wrap-up warning), so a follow-on session in the same workspace can pick up context. Surfaced inline on the session API alongside the transcript.
/workspace/.agent.log — the agent service’s log file. Surfaced inline on the session API (tail-truncated for large logs) so operators can inspect what the service did without a separate kubectl path.
/workspace/.progress/ — progress files the model writes via bash during long tasks: one file per discrete step plus a current.json for the live “what’s happening right now” status. The agent service tails this directory and emits step_status events (per step) and progress_status events (for current.json) as files change; the console’s progress tracker renders both.
/workspace/scratch/ — the model’s working directory. The agent service creates it at startup, every bash command runs with this as cwd, and the agent service diffs the directory pre- and post-task so the task_end event’s files_created / files_modified fields reflect what the task actually produced. The console’s session view surfaces these files inline.

Workspace fields

The customer-visible fields on a workspace row:

Field	Purpose
`id`	Stable workspace identifier. Used in URLs, audit rows, CLI commands.
`org_id`	Tenant scope.
`role_id`	The agent role the workspace was created under. Determines provider, model, capabilities, pool policy.
`environment`	Free-form label inherited from the role at create time, surfaced in audit rows.
`created_by_user_id`	The user who created the workspace, or `system` for API-key-driven creates. Drives `:own` permission narrowing and audit attribution.
`status`	Closed enum tracking the workspace’s lifecycle phase — see Lifecycle.
`created_at`, `updated_at`	Bookkeeping.
`archived_at`	Soft-delete tombstone — see Archive.
`snapshot_ref`	Opaque reference to the most recent saved snapshot. Empty until the first stop — see Snapshots.
`last_snapshot_error`	Populated when a snapshot save failed against a pod that had already gone away (eviction, OOM). Cleared on the next successful save.
`budget_snapshot`	The proxy’s final budget-counter state at the most recent stop, persisted so resume can hand it back to a freshly activated proxy — see Budget counters.
`parent_workspace_id`, `fork_source_snapshot_ref`, `fork_point_task_count`	Fork lineage, set only on workspaces created via fork — see Fork.
`pack_status`	Derived runtime field: `current` (pod’s mounted packs match the role’s current pack set), `stale` (role packs changed since the pod was created — restart to apply), or `unknown` (no live pod).

Task fields

A task row is smaller — most of the rich state lives on its paired activity events rather than on the row itself:

Field	Purpose
`id`	Stable task identifier.
`workspace_id`, `org_id`	Which workspace the task ran in, and the owning org.
`status`	Closed enum tracking the task’s lifecycle phase — see Tasks.
`description`	The prompt the user submitted with the task.
`reason`	Terminal reason on `failed` / `cancelled` / `delivery_failed` (e.g. `timeout`, `manual`, `workspace_stop`). Empty on the happy path.
`duration_seconds`	Computed at read time from `completed_at - created_at`. Not persisted as a column.
`created_at`, `completed_at`	Bookkeeping. `completed_at` is set on the terminal transition.

Iteration count, token usage, files created/modified, and the agent’s last response don’t live on the task row — they ride the paired task_end activity event detail and are fetched lazily when an operator expands a task in the console.

Lifecycle

Workspace status is a closed enum. Transitions are validated server-side; an illegal transition fails the request.

Status	Meaning
`provisioning`	A pod is being claimed (from the warm pool) or cold-started. The workspace exists but isn’t ready for tasks.
`idle`	Pod is up and the external proxy is activated. Ready for tasks.
`busy`	A task is currently executing.
`stopping`	Stop is in flight: any running task is cancelled, the agent flushes its write queues, a snapshot is captured.
`stopped`	Pod is destroyed; the filesystem lives on as a snapshot. Submitting another task — or hitting Resume — revives it.
`failed`	Provisioning, snapshot save, or proxy activation hit an unrecoverable error. The row stays for inspection.

Warm pool vs cold start

When a workspace enters provisioning, the platform first tries to claim a pre-warmed pod from the role’s pool. A pool pod is a workspace pod that’s already been scheduled and started on a node with the role’s configured packs, sitting idle with no activated proxy state. Claiming one is sub-second — the pod is already up, the platform just hands it ownership of the new workspace and activates the proxy.

If no pool pod is available — pool empty, or the role has no pool policy configured — the workspace cold-starts: the platform creates a new pod from scratch. That’s tens of seconds (image pull, scheduler, init container, agent boot) but works the same way once it’s running.

Whether a role keeps a warm pool, and how big, is part of the role’s pool policy. Workspaces themselves don’t choose; they take whatever’s available.

Surprising transitions

Three transitions are worth calling out because they can surprise you:

busy → provisioning is the force-restart path. When a role’s configuration changes (packs swapped, capabilities edited, model changed) and you restart an active workspace, it transitions through provisioning again so the new pod can come up under the new config.
stopping → idle is the Stop-aborted path. If the snapshot save fails but the pod is still alive, Stop bails back to idle rather than wedging the workspace in stopping. You can retry.
failed only transitions back via stopping. A failed workspace can’t be resumed directly; you stop it first (which cleans up partial state), then create a fresh workspace.

Timers

Each role declares two timers the platform enforces:

Idle timeout — how long a workspace can sit with no task before being stopped automatically (snapshot + pod teardown). Keeps resources from being pinned indefinitely.
Task timeout — how long any single task may run. The control plane warns the agent at ~80% of the timeout so the agent can wrap up gracefully; at 100% the task is cancelled.

Both re-arm on the relevant state changes, and both are tuned per role.

Tasks

A task is one unit of work submitted to a workspace. Two paths create one:

POST /v1/tasks atomically creates a workspace and submits its first task. This is what evershell run <role> "<description>" does.
POST /v1/workspaces/{id}/tasks submits a task to an existing workspace. The workspace doesn’t have to be idle — submitting to a stopped workspace transparently resumes it first.

Task status is a closed enum, also state-machine validated:

Status	Meaning
`submitted`	Row created; CP is about to hand the task to the agent.
`delivered`	Agent acknowledged receipt.
`running`	Agent’s task loop is executing.
`compacting`	The agent is summarising older conversation turns to make room and will return to `running`.
`completed`	Terminal — happy path.
`failed`	Terminal — agent or provider error.
`cancelled`	Terminal — task timeout, manual cancel, or workspace stop.
`delivery_failed`	Terminal — control plane couldn’t reach the agent at all.

Two close events per task

Every task in the audit stream ends with at least one terminal event. On the happy path the activity feed shows two:

task_end — emitted by the agent inside the pod when its task loop finishes. Carries iteration count, the wrap-up reason if any, files created, and a token-usage rollup.
task_completed — emitted by the control plane when it acknowledges the close (or has to force one).

The console suppresses task_completed on the happy path so the activity feed doesn’t show a duplicate close. But both exist because the agent may never get to emit task_end: if a task hits its timeout, or the workspace is stopped mid-task, or the pod is evicted, the agent doesn’t report back. In those cases the control plane emits task_completed itself with the appropriate cancellation reason — so every task is guaranteed at least one terminal event regardless of how it ended.

Sessions, iterations, compaction

These are agent-internal events that surface on the activity stream alongside the task lifecycle. Three things worth knowing:

Sessions are bigger than tasks. A session is the agent’s ongoing conversation with the model. One session can span multiple tasks — submitting a follow-up task to an idle workspace continues the existing session rather than starting a new one.
A session ends only via wrap-up. When a limit approaches — a budget counter approaching its cap, the task duration nearing its timeout, or the context window filling after the role’s max_continuations of compaction have been exhausted (see below) — the agent gets a wrap-up warning, transitions to wrapping_up, and gets a few more turns to write a .session_handoff.md and finish cleanly. When the agent loop exits in wrapping_up, the session transitions to completed. A task ending on its own doesn’t end the session; the session keeps going until one of the wrap-up triggers fires.
The next task after a completed session starts fresh, but reads the handoff. A new session is created and the agent’s system prompt picks up .session_handoff.md so the model can continue where the previous session left off.
You can force a reset by submitting a task with reset_session=true. That archives the current session’s transcript, deletes the handoff file, and starts a clean session with no inherited context. Use it when you want a true blank slate rather than a handoff continuation.
Iterations are smaller than tasks. Each model turn — one prompt sent, one response received — is an iteration. iteration_end carries a token-usage map (input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens, thinking_output_tokens); the console renders these as chips.
Compaction keeps the session going. When the conversation approaches the role’s max_context_tokens, the agent summarises older turns and continues. compaction_start and compaction_end bracket the operation; continuation on the detail tracks which round it is. The role’s max_continuations caps how many compaction rounds a single task may chain before the agent wraps up instead.

Session lifecycle events: session_start, session_wrap_up (with a closed-enum reason: context, budget, or task), and session_end. All ride as category=activity audit rows and surface on GET /v1/workspaces/{id}/activity (and the evershell logs stream).

Snapshots

When a workspace stops, the control plane captures /workspace as a tar.gz archive and stores it in the platform’s snapshot store. The agent’s JSONL transcript, the handoff summary it wrote before wrapping up, anything else under /workspace — all of it rides in the snapshot. /workspace/packs/ is excluded: packs are independently mounted from their image-volume sources, so resume re-mounts them from the registry rather than baking stale copies into the archive.

Workspace state is captured at multiple points so it survives both planned shutdowns and unplanned pod loss:

Every Stop snapshots before tearing the pod down — whether the stop was explicit (operator stop, agent shutdown) or implicit (idle-timer or task-timeout fired).
Restart snapshots before destroying the old pod, then resume restores into the new one — the operator just sees “restarting” continuously, but underneath it’s a snapshot round-trip.
Fork snapshots when its parent is currently idle and has a live pod; otherwise it reuses the parent’s existing snapshot.
Periodic snapshots fire on a timer while a task is active, so a long-running task doesn’t lose progress if something hits the pod between explicit save points.
Pod-termination watch — when Kubernetes sets DeletionTimestamp on the pod (node drain, scale-down, manual kubectl delete), the platform races a snapshot against the grace period before SIGKILL lands, so state isn’t lost to an involuntary eviction.

Snapshot save is the step everything else depends on. Every path that captures one guards against failure: if the pod is still alive, the operation bails back to the previous status so you can retry; if the pod is already gone (eviction, OOM), the workspace settles into stopped with a last_snapshot_error recorded so the UI can flag “stopped, snapshot lost” and the operator can decide whether to resume against the older snapshot or start empty.

Resume claims a fresh pod (pool or cold-start), extracts the snapshot into /workspace, reactivates the external proxy with the role’s current policy, and transitions back to idle.

Budget counters

Capability budget counters are tracked and enforced at the external proxy, not inside the workspace pod. The customer- relevant link between budgets and workspaces is recovery: when a workspace stops, the proxy reports its final counter state, and that state is persisted onto the workspace row. On resume, those counters are sent back to the freshly reactivated proxy so in-flight reservations and the current TTL windows pick up where they left off. From a workspace’s perspective, the counters are “persisted with the rest of my state.” The actual semantics of counters — what they meter, how reservations work, how floors fire — belong to capabilities; see the caps.yaml reference.

Fork

A fork creates a new workspace from a parent’s snapshot, with the parent’s conversation rewound to a specific task boundary. The child is fully independent: its own pod, its own budget counters (not inherited), its own status, its own audit trail. The parent is untouched.

The parent must be idle, stopped, or failed to be forkable. A busy, provisioning, or stopping parent has no consistent snapshot to fork from. When the parent is idle, fork drains its write queue and snapshots fresh; when the parent is stopped or failed, fork uses the parent’s existing snapshot.

Fork records three things on the child:

The parent workspace id, so the child knows where it came from.
The snapshot reference used at fork time — for display and audit, not a live pointer. The parent’s future stops may overwrite that blob; the child holds its own independent extraction from when fork ran.
The fork-point task count, recording how many of the parent’s tasks were included before the conversation was truncated.

Ownership flows to the forker, not the parent’s creator. A fork is a fresh working session for whoever ran it; that’s who shows up on per-user visibility filters and audit attribution.

In the console you can fork from two places:

The task list on a workspace’s detail page — click Fork on any task row when the workspace is in a forkable state.
The activity panel — the Fork button shows on task_end rows for completed tasks, gated on the same forkable-status rule.

Live Agents

Live Agents — opened via the Watch Live button on the Workspaces page — is the console’s real-time view of the org’s running agents. It’s the single screen for “what are my agents doing right now, and what are they reaching out to.”

Layout

Left panel: a compact list of every workspace, grouped by role. A filter input narrows the list and an Active/All toggle decides whether stopped workspaces are included or hidden. Each group header shows the role name, the count of workspaces in the group, and a small network icon that jumps to that role’s Role Topology view. Each row shows the workspace ID and a status badge; clicking a row focuses that workspace on the canvas (Cmd/Ctrl-click toggles multi-select for focusing several at once).
Right canvas: a graph view of the same workspaces and what they’re connected to, with pan / zoom / fullscreen controls and a Graph search input that narrows the canvas to nodes and edges matching the query (role names, domains, capabilities). Clicking an agent node focuses it the same way the left panel does.

What’s on the graph

The canvas has two kinds of nodes:

Agent nodes — one per running workspace. Each renders the role, the workspace’s current status, a visual cue for the agent’s current phase (thinking, responding, executing a bash command), and a one-line summary of the active task. Right-clicking an agent node opens a context menu with the actions available against that workspace, gated on the caller’s scopes: Send task, Cancel task (when the workspace is busy), View details (workspace detail page), Edit role, View role topology (jumps to the topology view focused on the role), and Stop workspace. Right-clicking empty canvas instead offers New task and View all workspaces.
API-service nodes — one per upstream domain the workspace’s capabilities authorise traffic to, plus any domain an agent tried to reach and got denied. Denied nodes are visually distinct.

The edges between them are the policy:

Capability edges — agent → API. Each edge carries the capability name, the allowed HTTP methods and path patterns, the credentials and secrets the proxy injects on the request, and the budget counters that meter calls along this edge. The edge is, in effect, “what is this agent allowed to do against this service, and how is it metered.”
Deny edges — render separately when an agent attempts a domain or method the policy doesn’t admit.

When the proxy decides on a request, the matching edge animates the request flow in real time — allow vs deny show as different animations. So you don’t just see the static allowed-paths graph; you see traffic moving across it as it happens.

Stats and scope

Every edge accumulates counters: iterations, bash calls and bash errors on the agent side, HTTP allowed and denied on the outbound side. A stats scope toggle (task / session / workspace) controls the window the counters aggregate over — “just the current task,” “everything in the current session,” or “everything this workspace has ever done.” Stats backfill from the workspace’s activity and audit history on open and stay updated live from the same SSE feed that drives the animations.

Live data sources

The view subscribes to a multiplexed SSE feed of activity events and policy_decision audit rows. A connection-status indicator shows whether the feed is live; if it disconnects, the view marks itself offline rather than showing stale numbers as fresh. An event-type filter lets you narrow which event kinds drive the visualisation (handy when a workspace is chatty and the animations get noisy).

The view is for the live picture — currently active workspaces and what they’re touching. For after-the-fact inspection of any single workspace’s history, use that workspace’s detail page (activity panel + audit tab).