Files
secure-agent/SPEC.md
T
2026-03-04 14:35:19 -07:00

13 KiB

Coding Agent — Project Specification

Audience: Junior developer onboarding to this project. Stack: Python · UV · microsandbox (MCP) · Textual (TUI) · pytest Goal: A local coding agent with a TUI that can later be served as a web UI for remote access.


What We're Building

A coding agent that:

  • Accepts user prompts via a terminal UI
  • Uses Claude (via the Anthropic SDK) as the LLM
  • Executes all file and shell operations inside a microsandbox microVM
  • Exposes those operations via MCP so the tool layer is swappable
  • Can later be served over HTTP for remote/web access without rewriting core logic

Project File Structure

coding-agent/
│
├── pyproject.toml              # UV project manifest — dependencies, scripts, tool config
├── .python-version             # Pins Python version for UV
├── .env.example                # Template for required env vars (copy to .env)
├── README.md
│
├── agent/                      # Core agent logic — no UI concerns here
│   ├── __init__.py
│   ├── loop.py                 # The agentic loop: send message → get response → handle tool calls → repeat
│   ├── tools.py                # Tool definitions (schemas Claude sees) and dispatch table
│   ├── history.py              # Conversation history management
│   └── config.py               # Settings loaded from env vars (API keys, model name, safedir path)
│
├── sandbox/                    # All microsandbox interaction lives here
│   ├── __init__.py
│   ├── session.py              # Creates/destroys the sandbox session, exposes run(), holds lifecycle
│   └── mcp_client.py           # Connects to microsandbox's MCP server, wraps tool calls
│
├── tools/                      # Individual tool implementations — each calls sandbox/mcp_client.py
│   ├── __init__.py
│   ├── bash.py                 # run_bash(command) → str
│   ├── read.py                 # read_file(path) → str
│   ├── write.py                # write_file(path, content) → str
│   ├── list_dir.py             # list_dir(path) → str
│   └── search.py               # search_files(pattern) → str
│
├── ui/
│   ├── __init__.py
│   ├── tui/
│   │   ├── __init__.py
│   │   └── app.py              # Textual app — renders chat, captures input, calls agent/loop.py
│   └── web/                    # Stubbed out — implemented later
│       └── __init__.py         # Placeholder — see Web UI section below
│
├── tests/
│   ├── conftest.py             # Shared pytest fixtures (mock sandbox session, sample history, etc.)
│   ├── test_loop.py            # Unit tests for agentic loop logic
│   ├── test_tools.py           # Unit tests for each tool (mock the sandbox)
│   ├── test_history.py         # Tests for conversation history management
│   └── test_sandbox.py         # Integration tests for sandbox session (require msb server running)
│
└── scripts/
    └── start_sandbox_server.sh # Convenience: runs `msb server start --dev`

Dependency Overview

Add these in pyproject.toml under [project.dependencies]:

Package Purpose Docs
anthropic Anthropic SDK — LLM calls and MCP client support https://docs.anthropic.com
microsandbox Python SDK for microsandbox VM sessions https://github.com/zerocore-ai/microsandbox
textual TUI framework — the terminal interface https://textual.textualize.io
python-dotenv Load .env file into environment https://pypi.org/project/python-dotenv
pydantic Settings validation and tool schema modeling https://docs.pydantic.dev

Dev dependencies ([project.optional-dependencies] dev):

Package Purpose
pytest Test runner
pytest-asyncio Async test support (needed — most code is async)
pytest-mock Mocking sandbox calls in unit tests

UV Quickstart

# Install UV if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create project
uv init coding-agent
cd coding-agent

# Add dependencies
uv add anthropic microsandbox textual python-dotenv pydantic
uv add --dev pytest pytest-asyncio pytest-mock

# Run the TUI
uv run python -m ui.tui.app

# Run tests
uv run pytest

Architecture: Concerns and Boundaries

The most important rule: each layer only talks to the layer directly below it.

┌─────────────────────────────────────┐
│         UI Layer (ui/)              │  Renders output, captures input.
│     Textual TUI  |  Web (later)     │  No LLM calls. No sandbox calls.
└──────────────┬──────────────────────┘
               │ calls
┌──────────────▼──────────────────────┐
│       Agent Layer (agent/)          │  Owns the loop. Talks to Anthropic API.
│  loop.py · tools.py · history.py   │  Decides which tools to call.
└──────────────┬──────────────────────┘
               │ calls
┌──────────────▼──────────────────────┐
│       Tools Layer (tools/)          │  One file per tool. Pure functions.
│  bash · read · write · list · grep  │  No LLM knowledge. No UI knowledge.
└──────────────┬──────────────────────┘
               │ calls
┌──────────────▼──────────────────────┐
│      Sandbox Layer (sandbox/)       │  Owns the VM session and MCP connection.
│   session.py · mcp_client.py        │  Everything executes in here.
└─────────────────────────────────────┘
               │
         microVM (isolated)
         safedir mounted in

Why this matters: When you swap the TUI for a web UI, you only touch ui/. When you swap microsandbox for a different execution backend, you only touch sandbox/. The agent loop doesn't change.


Key Implementation Notes

1. The Agentic Loop (agent/loop.py)

This is the heart of the project. The pattern is:

  1. Add user message to history
  2. Send full history to Claude
  3. If response contains tool calls → execute them → add results to history → go to 2
  4. If response is plain text → return it to the UI
# Rough shape of loop.py
async def run_turn(user_message: str, history: list, sandbox) -> str:
    history.append({"role": "user", "content": user_message})

    while True:
        response = await call_claude(history)

        if response.stop_reason == "end_turn":
            return response.text

        if response.stop_reason == "tool_use":
            tool_results = await execute_tools(response.tool_calls, sandbox)
            history.append({"role": "assistant", "content": response.content})
            history.append({"role": "user", "content": tool_results})
            # loop continues

Reference: https://docs.anthropic.com/en/docs/build-with-claude/tool-use

2. Tool Definitions (agent/tools.py)

Claude needs two things for tools: a JSON schema describing each tool, and a dispatch function that routes tool calls to the right implementation.

# tools.py exports two things:
TOOL_SCHEMAS = [
    {
        "name": "bash",
        "description": "Run a shell command in the sandbox",
        "input_schema": {
            "type": "object",
            "properties": {
                "command": {"type": "string", "description": "The shell command to run"}
            },
            "required": ["command"]
        }
    },
    # ... one entry per tool
]

async def dispatch(tool_name: str, tool_input: dict, sandbox) -> str:
    # routes to tools/bash.py, tools/read.py etc.

3. Sandbox Session (sandbox/session.py)

The sandbox session should be created once at agent startup and reused for the entire conversation. This preserves state between tool calls (installed packages, created files, env vars).

# sandbox/session.py
from microsandbox import PythonSandbox

class SandboxSession:
    async def __aenter__(self):
        self._sb = await PythonSandbox.create(name="coding-agent")
        return self

    async def run(self, command: str) -> str:
        exec = await self._sb.run(command)
        return await exec.output()

    async def __aexit__(self, *args):
        await self._sb.stop()

Reference: https://github.com/zerocore-ai/microsandbox/blob/main/sdk/README.md

4. MCP vs Direct SDK

microsandbox supports two integration patterns:

  • Direct SDK (PythonSandbox.create()) — simpler, Python-native, recommended to start with
  • MCP server — microsandbox exposes an MCP server; the Anthropic SDK can connect to it directly, and tool definitions come from the server automatically

Start with the direct SDK (sandbox/session.py). The sandbox/mcp_client.py file is stubbed for later when you want to switch to the MCP path. The MCP approach reduces boilerplate but adds a moving part.

MCP reference: https://github.com/zerocore-ai/microsandbox/blob/main/MCP.md Anthropic MCP docs: https://docs.anthropic.com/en/docs/build-with-claude/mcp

5. The TUI (ui/tui/app.py)

Use Textual for the TUI. It's async-native which fits well since the agent loop is async.

A minimal Textual app has:

  • A RichLog or Markdown widget for displaying conversation
  • An Input widget for capturing user messages
  • An on_input_submitted handler that calls agent/loop.py and appends the result

Reference: https://textual.textualize.io/guide/

6. Configuration (agent/config.py)

Use pydantic-settings to load from .env:

from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    anthropic_api_key: str
    model: str = "claude-sonnet-4-5-20250929"
    safedir: str = "./workspace"
    max_tokens: int = 8096

    class Config:
        env_file = ".env"

Environment Variables

Copy .env.example to .env and fill in:

ANTHROPIC_API_KEY=sk-ant-...
MODEL=claude-sonnet-4-5-20250929
SAFEDIR=./workspace

Testing Strategy

Unit tests (no sandbox required — mock everything):

  • test_loop.py — mock Claude responses, verify tool calls are dispatched correctly
  • test_tools.py — mock SandboxSession.run(), verify each tool formats input/output correctly
  • test_history.py — verify history trimming, message formatting

Integration tests (require msb server start --dev):

  • test_sandbox.py — actually runs commands in a VM, verifies output
  • Mark these with @pytest.mark.integration and skip by default:
# conftest.py
def pytest_addoption(parser):
    parser.addoption("--integration", action="store_true")

def pytest_collection_modifyitems(config, items):
    if not config.getoption("--integration"):
        skip = pytest.mark.skip(reason="pass --integration to run")
        for item in items:
            if "integration" in item.keywords:
                item.add_marker(skip)

Run integration tests: uv run pytest --integration


Web UI — Future Path (No Node Required Yet)

When ready to add a web UI, the approach that avoids Node:

  1. Add FastAPI + uvicorn to dependencies
  2. Create ui/web/app.py — a FastAPI app with a /chat endpoint that calls agent/loop.py
  3. Use Server-Sent Events (SSE) for streaming responses
  4. Serve a minimal HTML/CSS frontend as a static file from FastAPI

The agent layer doesn't change at all. You're just adding a second entry point alongside the TUI.

When the project is mature enough to warrant a proper frontend, that's the point to introduce a JS framework. Until then, FastAPI + plain HTML gets you remote access without the Node toolchain.


Prerequisites Before Writing Code

  1. Install microsandbox: curl -sSL https://get.microsandbox.dev | sh
  2. Start the server: msb server start --dev
  3. Pull the Python image: msb pull microsandbox/python
  4. Set your ANTHROPIC_API_KEY in .env

Suggested Build Order

  1. agent/config.py — settings first, everything imports this
  2. sandbox/session.py — get a VM running and verify you can execute commands
  3. tools/bash.py + tools/read.py — minimal tool set to prove the loop works
  4. agent/tools.py — schemas and dispatch for those two tools
  5. agent/history.py — simple list wrapper to start
  6. agent/loop.py — wire it all together, test in a plain Python script first
  7. ui/tui/app.py — put a Textual face on the working loop
  8. Remaining tools (write, list_dir, search)
  9. Tests throughout — write them alongside each module, not at the end