13 KiB
Coding Agent — Project Specification
Audience: Junior developer onboarding to this project. Stack: Python · UV · microsandbox (MCP) · Textual (TUI) · pytest Goal: A local coding agent with a TUI that can later be served as a web UI for remote access.
What We're Building
A coding agent that:
- Accepts user prompts via a terminal UI
- Uses Claude (via the Anthropic SDK) as the LLM
- Executes all file and shell operations inside a microsandbox microVM
- Exposes those operations via MCP so the tool layer is swappable
- Can later be served over HTTP for remote/web access without rewriting core logic
Project File Structure
coding-agent/
│
├── pyproject.toml # UV project manifest — dependencies, scripts, tool config
├── .python-version # Pins Python version for UV
├── .env.example # Template for required env vars (copy to .env)
├── README.md
│
├── agent/ # Core agent logic — no UI concerns here
│ ├── __init__.py
│ ├── loop.py # The agentic loop: send message → get response → handle tool calls → repeat
│ ├── tools.py # Tool definitions (schemas Claude sees) and dispatch table
│ ├── history.py # Conversation history management
│ └── config.py # Settings loaded from env vars (API keys, model name, safedir path)
│
├── sandbox/ # All microsandbox interaction lives here
│ ├── __init__.py
│ ├── session.py # Creates/destroys the sandbox session, exposes run(), holds lifecycle
│ └── mcp_client.py # Connects to microsandbox's MCP server, wraps tool calls
│
├── tools/ # Individual tool implementations — each calls sandbox/mcp_client.py
│ ├── __init__.py
│ ├── bash.py # run_bash(command) → str
│ ├── read.py # read_file(path) → str
│ ├── write.py # write_file(path, content) → str
│ ├── list_dir.py # list_dir(path) → str
│ └── search.py # search_files(pattern) → str
│
├── ui/
│ ├── __init__.py
│ ├── tui/
│ │ ├── __init__.py
│ │ └── app.py # Textual app — renders chat, captures input, calls agent/loop.py
│ └── web/ # Stubbed out — implemented later
│ └── __init__.py # Placeholder — see Web UI section below
│
├── tests/
│ ├── conftest.py # Shared pytest fixtures (mock sandbox session, sample history, etc.)
│ ├── test_loop.py # Unit tests for agentic loop logic
│ ├── test_tools.py # Unit tests for each tool (mock the sandbox)
│ ├── test_history.py # Tests for conversation history management
│ └── test_sandbox.py # Integration tests for sandbox session (require msb server running)
│
└── scripts/
└── start_sandbox_server.sh # Convenience: runs `msb server start --dev`
Dependency Overview
Add these in pyproject.toml under [project.dependencies]:
| Package | Purpose | Docs |
|---|---|---|
anthropic |
Anthropic SDK — LLM calls and MCP client support | https://docs.anthropic.com |
microsandbox |
Python SDK for microsandbox VM sessions | https://github.com/zerocore-ai/microsandbox |
textual |
TUI framework — the terminal interface | https://textual.textualize.io |
python-dotenv |
Load .env file into environment |
https://pypi.org/project/python-dotenv |
pydantic |
Settings validation and tool schema modeling | https://docs.pydantic.dev |
Dev dependencies ([project.optional-dependencies] dev):
| Package | Purpose |
|---|---|
pytest |
Test runner |
pytest-asyncio |
Async test support (needed — most code is async) |
pytest-mock |
Mocking sandbox calls in unit tests |
UV Quickstart
# Install UV if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create project
uv init coding-agent
cd coding-agent
# Add dependencies
uv add anthropic microsandbox textual python-dotenv pydantic
uv add --dev pytest pytest-asyncio pytest-mock
# Run the TUI
uv run python -m ui.tui.app
# Run tests
uv run pytest
Architecture: Concerns and Boundaries
The most important rule: each layer only talks to the layer directly below it.
┌─────────────────────────────────────┐
│ UI Layer (ui/) │ Renders output, captures input.
│ Textual TUI | Web (later) │ No LLM calls. No sandbox calls.
└──────────────┬──────────────────────┘
│ calls
┌──────────────▼──────────────────────┐
│ Agent Layer (agent/) │ Owns the loop. Talks to Anthropic API.
│ loop.py · tools.py · history.py │ Decides which tools to call.
└──────────────┬──────────────────────┘
│ calls
┌──────────────▼──────────────────────┐
│ Tools Layer (tools/) │ One file per tool. Pure functions.
│ bash · read · write · list · grep │ No LLM knowledge. No UI knowledge.
└──────────────┬──────────────────────┘
│ calls
┌──────────────▼──────────────────────┐
│ Sandbox Layer (sandbox/) │ Owns the VM session and MCP connection.
│ session.py · mcp_client.py │ Everything executes in here.
└─────────────────────────────────────┘
│
microVM (isolated)
safedir mounted in
Why this matters: When you swap the TUI for a web UI, you only touch ui/. When you swap microsandbox for a different execution backend, you only touch sandbox/. The agent loop doesn't change.
Key Implementation Notes
1. The Agentic Loop (agent/loop.py)
This is the heart of the project. The pattern is:
- Add user message to history
- Send full history to Claude
- If response contains tool calls → execute them → add results to history → go to 2
- If response is plain text → return it to the UI
# Rough shape of loop.py
async def run_turn(user_message: str, history: list, sandbox) -> str:
history.append({"role": "user", "content": user_message})
while True:
response = await call_claude(history)
if response.stop_reason == "end_turn":
return response.text
if response.stop_reason == "tool_use":
tool_results = await execute_tools(response.tool_calls, sandbox)
history.append({"role": "assistant", "content": response.content})
history.append({"role": "user", "content": tool_results})
# loop continues
Reference: https://docs.anthropic.com/en/docs/build-with-claude/tool-use
2. Tool Definitions (agent/tools.py)
Claude needs two things for tools: a JSON schema describing each tool, and a dispatch function that routes tool calls to the right implementation.
# tools.py exports two things:
TOOL_SCHEMAS = [
{
"name": "bash",
"description": "Run a shell command in the sandbox",
"input_schema": {
"type": "object",
"properties": {
"command": {"type": "string", "description": "The shell command to run"}
},
"required": ["command"]
}
},
# ... one entry per tool
]
async def dispatch(tool_name: str, tool_input: dict, sandbox) -> str:
# routes to tools/bash.py, tools/read.py etc.
3. Sandbox Session (sandbox/session.py)
The sandbox session should be created once at agent startup and reused for the entire conversation. This preserves state between tool calls (installed packages, created files, env vars).
# sandbox/session.py
from microsandbox import PythonSandbox
class SandboxSession:
async def __aenter__(self):
self._sb = await PythonSandbox.create(name="coding-agent")
return self
async def run(self, command: str) -> str:
exec = await self._sb.run(command)
return await exec.output()
async def __aexit__(self, *args):
await self._sb.stop()
Reference: https://github.com/zerocore-ai/microsandbox/blob/main/sdk/README.md
4. MCP vs Direct SDK
microsandbox supports two integration patterns:
- Direct SDK (
PythonSandbox.create()) — simpler, Python-native, recommended to start with - MCP server — microsandbox exposes an MCP server; the Anthropic SDK can connect to it directly, and tool definitions come from the server automatically
Start with the direct SDK (sandbox/session.py). The sandbox/mcp_client.py file is stubbed for later when you want to switch to the MCP path. The MCP approach reduces boilerplate but adds a moving part.
MCP reference: https://github.com/zerocore-ai/microsandbox/blob/main/MCP.md Anthropic MCP docs: https://docs.anthropic.com/en/docs/build-with-claude/mcp
5. The TUI (ui/tui/app.py)
Use Textual for the TUI. It's async-native which fits well since the agent loop is async.
A minimal Textual app has:
- A
RichLogorMarkdownwidget for displaying conversation - An
Inputwidget for capturing user messages - An
on_input_submittedhandler that callsagent/loop.pyand appends the result
Reference: https://textual.textualize.io/guide/
6. Configuration (agent/config.py)
Use pydantic-settings to load from .env:
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
anthropic_api_key: str
model: str = "claude-sonnet-4-5-20250929"
safedir: str = "./workspace"
max_tokens: int = 8096
class Config:
env_file = ".env"
Environment Variables
Copy .env.example to .env and fill in:
ANTHROPIC_API_KEY=sk-ant-...
MODEL=claude-sonnet-4-5-20250929
SAFEDIR=./workspace
Testing Strategy
Unit tests (no sandbox required — mock everything):
test_loop.py— mock Claude responses, verify tool calls are dispatched correctlytest_tools.py— mockSandboxSession.run(), verify each tool formats input/output correctlytest_history.py— verify history trimming, message formatting
Integration tests (require msb server start --dev):
test_sandbox.py— actually runs commands in a VM, verifies output- Mark these with
@pytest.mark.integrationand skip by default:
# conftest.py
def pytest_addoption(parser):
parser.addoption("--integration", action="store_true")
def pytest_collection_modifyitems(config, items):
if not config.getoption("--integration"):
skip = pytest.mark.skip(reason="pass --integration to run")
for item in items:
if "integration" in item.keywords:
item.add_marker(skip)
Run integration tests: uv run pytest --integration
Web UI — Future Path (No Node Required Yet)
When ready to add a web UI, the approach that avoids Node:
- Add FastAPI + uvicorn to dependencies
- Create
ui/web/app.py— a FastAPI app with a/chatendpoint that callsagent/loop.py - Use Server-Sent Events (SSE) for streaming responses
- Serve a minimal HTML/CSS frontend as a static file from FastAPI
The agent layer doesn't change at all. You're just adding a second entry point alongside the TUI.
When the project is mature enough to warrant a proper frontend, that's the point to introduce a JS framework. Until then, FastAPI + plain HTML gets you remote access without the Node toolchain.
Prerequisites Before Writing Code
- Install microsandbox:
curl -sSL https://get.microsandbox.dev | sh - Start the server:
msb server start --dev - Pull the Python image:
msb pull microsandbox/python - Set your
ANTHROPIC_API_KEYin.env
Suggested Build Order
agent/config.py— settings first, everything imports thissandbox/session.py— get a VM running and verify you can execute commandstools/bash.py+tools/read.py— minimal tool set to prove the loop worksagent/tools.py— schemas and dispatch for those two toolsagent/history.py— simple list wrapper to startagent/loop.py— wire it all together, test in a plain Python script firstui/tui/app.py— put a Textual face on the working loop- Remaining tools (
write,list_dir,search) - Tests throughout — write them alongside each module, not at the end