initial project setup

2026-02-12 09:06:32 -07:00
parent e1314cfd0e
commit ec90e8fea3
7 changed files with 1013 additions and 0 deletions
@@ -0,0 +1,342 @@
+# Coding Agent — Project Specification
+
+> **Audience**: Junior developer onboarding to this project.
+> **Stack**: Python · UV · microsandbox (MCP) · Textual (TUI) · pytest
+> **Goal**: A local coding agent with a TUI that can later be served as a web UI for remote access.
+
+---
+
+## What We're Building
+
+A coding agent that:
+- Accepts user prompts via a terminal UI
+- Uses Claude (via the Anthropic SDK) as the LLM
+- Executes all file and shell operations inside a microsandbox microVM
+- Exposes those operations via MCP so the tool layer is swappable
+- Can later be served over HTTP for remote/web access without rewriting core logic
+
+---
+
+## Project File Structure
+
+```
+coding-agent/
+│
+├── pyproject.toml              # UV project manifest — dependencies, scripts, tool config
+├── .python-version             # Pins Python version for UV
+├── .env.example                # Template for required env vars (copy to .env)
+├── README.md
+│
+├── agent/                      # Core agent logic — no UI concerns here
+│   ├── __init__.py
+│   ├── loop.py                 # The agentic loop: send message → get response → handle tool calls → repeat
+│   ├── tools.py                # Tool definitions (schemas Claude sees) and dispatch table
+│   ├── history.py              # Conversation history management
+│   └── config.py               # Settings loaded from env vars (API keys, model name, safedir path)
+│
+├── sandbox/                    # All microsandbox interaction lives here
+│   ├── __init__.py
+│   ├── session.py              # Creates/destroys the sandbox session, exposes run(), holds lifecycle
+│   └── mcp_client.py           # Connects to microsandbox's MCP server, wraps tool calls
+│
+├── tools/                      # Individual tool implementations — each calls sandbox/mcp_client.py
+│   ├── __init__.py
+│   ├── bash.py                 # run_bash(command) → str
+│   ├── read.py                 # read_file(path) → str
+│   ├── write.py                # write_file(path, content) → str
+│   ├── list_dir.py             # list_dir(path) → str
+│   └── search.py               # search_files(pattern) → str
+│
+├── ui/
+│   ├── __init__.py
+│   ├── tui/
+│   │   ├── __init__.py
+│   │   └── app.py              # Textual app — renders chat, captures input, calls agent/loop.py
+│   └── web/                    # Stubbed out — implemented later
+│       └── __init__.py         # Placeholder — see Web UI section below
+│
+├── tests/
+│   ├── conftest.py             # Shared pytest fixtures (mock sandbox session, sample history, etc.)
+│   ├── test_loop.py            # Unit tests for agentic loop logic
+│   ├── test_tools.py           # Unit tests for each tool (mock the sandbox)
+│   ├── test_history.py         # Tests for conversation history management
+│   └── test_sandbox.py         # Integration tests for sandbox session (require msb server running)
+│
+└── scripts/
+    └── start_sandbox_server.sh # Convenience: runs `msb server start --dev`
+```
+
+---
+
+## Dependency Overview
+
+Add these in `pyproject.toml` under `[project.dependencies]`:
+
+| Package | Purpose | Docs |
+|---|---|---|
+| `anthropic` | Anthropic SDK — LLM calls and MCP client support | https://docs.anthropic.com |
+| `microsandbox` | Python SDK for microsandbox VM sessions | https://github.com/zerocore-ai/microsandbox |
+| `textual` | TUI framework — the terminal interface | https://textual.textualize.io |
+| `python-dotenv` | Load `.env` file into environment | https://pypi.org/project/python-dotenv |
+| `pydantic` | Settings validation and tool schema modeling | https://docs.pydantic.dev |
+
+Dev dependencies (`[project.optional-dependencies] dev`):
+
+| Package | Purpose |
+|---|---|
+| `pytest` | Test runner |
+| `pytest-asyncio` | Async test support (needed — most code is async) |
+| `pytest-mock` | Mocking sandbox calls in unit tests |
+
+### UV Quickstart
+
+```bash
+# Install UV if not already installed
+curl -LsSf https://astral.sh/uv/install.sh | sh
+
+# Create project
+uv init coding-agent
+cd coding-agent
+
+# Add dependencies
+uv add anthropic microsandbox textual python-dotenv pydantic
+uv add --dev pytest pytest-asyncio pytest-mock
+
+# Run the TUI
+uv run python -m ui.tui.app
+
+# Run tests
+uv run pytest
+```
+
+---
+
+## Architecture: Concerns and Boundaries
+
+The most important rule: **each layer only talks to the layer directly below it.**
+
+```
+┌─────────────────────────────────────┐
+│         UI Layer (ui/)              │  Renders output, captures input.
+│     Textual TUI  |  Web (later)     │  No LLM calls. No sandbox calls.
+└──────────────┬──────────────────────┘
+               │ calls
+┌──────────────▼──────────────────────┐
+│       Agent Layer (agent/)          │  Owns the loop. Talks to Anthropic API.
+│  loop.py · tools.py · history.py   │  Decides which tools to call.
+└──────────────┬──────────────────────┘
+               │ calls
+┌──────────────▼──────────────────────┐
+│       Tools Layer (tools/)          │  One file per tool. Pure functions.
+│  bash · read · write · list · grep  │  No LLM knowledge. No UI knowledge.
+└──────────────┬──────────────────────┘
+               │ calls
+┌──────────────▼──────────────────────┐
+│      Sandbox Layer (sandbox/)       │  Owns the VM session and MCP connection.
+│   session.py · mcp_client.py        │  Everything executes in here.
+└─────────────────────────────────────┘
+               │
+         microVM (isolated)
+         safedir mounted in
+```
+
+**Why this matters**: When you swap the TUI for a web UI, you only touch `ui/`. When you swap microsandbox for a different execution backend, you only touch `sandbox/`. The agent loop doesn't change.
+
+---
+
+## Key Implementation Notes
+
+### 1. The Agentic Loop (`agent/loop.py`)
+
+This is the heart of the project. The pattern is:
+
+1. Add user message to history
+2. Send full history to Claude
+3. If response contains tool calls → execute them → add results to history → go to 2
+4. If response is plain text → return it to the UI
+
+```python
+# Rough shape of loop.py
+async def run_turn(user_message: str, history: list, sandbox) -> str:
+    history.append({"role": "user", "content": user_message})
+
+    while True:
+        response = await call_claude(history)
+
+        if response.stop_reason == "end_turn":
+            return response.text
+
+        if response.stop_reason == "tool_use":
+            tool_results = await execute_tools(response.tool_calls, sandbox)
+            history.append({"role": "assistant", "content": response.content})
+            history.append({"role": "user", "content": tool_results})
+            # loop continues
+```
+
+Reference: https://docs.anthropic.com/en/docs/build-with-claude/tool-use
+
+### 2. Tool Definitions (`agent/tools.py`)
+
+Claude needs two things for tools: a JSON schema describing each tool, and a dispatch function that routes tool calls to the right implementation.
+
+```python
+# tools.py exports two things:
+TOOL_SCHEMAS = [
+    {
+        "name": "bash",
+        "description": "Run a shell command in the sandbox",
+        "input_schema": {
+            "type": "object",
+            "properties": {
+                "command": {"type": "string", "description": "The shell command to run"}
+            },
+            "required": ["command"]
+        }
+    },
+    # ... one entry per tool
+]
+
+async def dispatch(tool_name: str, tool_input: dict, sandbox) -> str:
+    # routes to tools/bash.py, tools/read.py etc.
+```
+
+### 3. Sandbox Session (`sandbox/session.py`)
+
+The sandbox session should be created once at agent startup and reused for the entire conversation. This preserves state between tool calls (installed packages, created files, env vars).
+
+```python
+# sandbox/session.py
+from microsandbox import PythonSandbox
+
+class SandboxSession:
+    async def __aenter__(self):
+        self._sb = await PythonSandbox.create(name="coding-agent")
+        return self
+
+    async def run(self, command: str) -> str:
+        exec = await self._sb.run(command)
+        return await exec.output()
+
+    async def __aexit__(self, *args):
+        await self._sb.stop()
+```
+
+Reference: https://github.com/zerocore-ai/microsandbox/blob/main/sdk/README.md
+
+### 4. MCP vs Direct SDK
+
+microsandbox supports two integration patterns:
+
+- **Direct SDK** (`PythonSandbox.create()`) — simpler, Python-native, recommended to start with
+- **MCP server** — microsandbox exposes an MCP server; the Anthropic SDK can connect to it directly, and tool definitions come from the server automatically
+
+Start with the direct SDK (`sandbox/session.py`). The `sandbox/mcp_client.py` file is stubbed for later when you want to switch to the MCP path. The MCP approach reduces boilerplate but adds a moving part.
+
+MCP reference: https://github.com/zerocore-ai/microsandbox/blob/main/MCP.md
+Anthropic MCP docs: https://docs.anthropic.com/en/docs/build-with-claude/mcp
+
+### 5. The TUI (`ui/tui/app.py`)
+
+Use **Textual** for the TUI. It's async-native which fits well since the agent loop is async.
+
+A minimal Textual app has:
+- A `RichLog` or `Markdown` widget for displaying conversation
+- An `Input` widget for capturing user messages
+- An `on_input_submitted` handler that calls `agent/loop.py` and appends the result
+
+Reference: https://textual.textualize.io/guide/
+
+### 6. Configuration (`agent/config.py`)
+
+Use `pydantic-settings` to load from `.env`:
+
+```python
+from pydantic_settings import BaseSettings
+
+class Settings(BaseSettings):
+    anthropic_api_key: str
+    model: str = "claude-sonnet-4-5-20250929"
+    safedir: str = "./workspace"
+    max_tokens: int = 8096
+
+    class Config:
+        env_file = ".env"
+```
+
+---
+
+## Environment Variables
+
+Copy `.env.example` to `.env` and fill in:
+
+```
+ANTHROPIC_API_KEY=sk-ant-...
+MODEL=claude-sonnet-4-5-20250929
+SAFEDIR=./workspace
+```
+
+---
+
+## Testing Strategy
+
+**Unit tests** (no sandbox required — mock everything):
+- `test_loop.py` — mock Claude responses, verify tool calls are dispatched correctly
+- `test_tools.py` — mock `SandboxSession.run()`, verify each tool formats input/output correctly
+- `test_history.py` — verify history trimming, message formatting
+
+**Integration tests** (require `msb server start --dev`):
+- `test_sandbox.py` — actually runs commands in a VM, verifies output
+- Mark these with `@pytest.mark.integration` and skip by default:
+
+```python
+# conftest.py
+def pytest_addoption(parser):
+    parser.addoption("--integration", action="store_true")
+
+def pytest_collection_modifyitems(config, items):
+    if not config.getoption("--integration"):
+        skip = pytest.mark.skip(reason="pass --integration to run")
+        for item in items:
+            if "integration" in item.keywords:
+                item.add_marker(skip)
+```
+
+Run integration tests: `uv run pytest --integration`
+
+---
+
+## Web UI — Future Path (No Node Required Yet)
+
+When ready to add a web UI, the approach that avoids Node:
+
+1. Add **FastAPI** + **uvicorn** to dependencies
+2. Create `ui/web/app.py` — a FastAPI app with a `/chat` endpoint that calls `agent/loop.py`
+3. Use **Server-Sent Events (SSE)** for streaming responses
+4. Serve a minimal HTML/CSS frontend as a static file from FastAPI
+
+The agent layer doesn't change at all. You're just adding a second entry point alongside the TUI.
+
+When the project is mature enough to warrant a proper frontend, that's the point to introduce a JS framework. Until then, FastAPI + plain HTML gets you remote access without the Node toolchain.
+
+---
+
+## Prerequisites Before Writing Code
+
+1. Install microsandbox: `curl -sSL https://get.microsandbox.dev | sh`
+2. Start the server: `msb server start --dev`
+3. Pull the Python image: `msb pull microsandbox/python`
+4. Set your `ANTHROPIC_API_KEY` in `.env`
+
+---
+
+## Suggested Build Order
+
+1. `agent/config.py` — settings first, everything imports this
+2. `sandbox/session.py` — get a VM running and verify you can execute commands
+3. `tools/bash.py` + `tools/read.py` — minimal tool set to prove the loop works
+4. `agent/tools.py` — schemas and dispatch for those two tools
+5. `agent/history.py` — simple list wrapper to start
+6. `agent/loop.py` — wire it all together, test in a plain Python script first
+7. `ui/tui/app.py` — put a Textual face on the working loop
+8. Remaining tools (`write`, `list_dir`, `search`)
+9. Tests throughout — write them alongside each module, not at the end