initial project setup
This commit is contained in:
@@ -0,0 +1,342 @@
|
||||
# Coding Agent — Project Specification
|
||||
|
||||
> **Audience**: Junior developer onboarding to this project.
|
||||
> **Stack**: Python · UV · microsandbox (MCP) · Textual (TUI) · pytest
|
||||
> **Goal**: A local coding agent with a TUI that can later be served as a web UI for remote access.
|
||||
|
||||
---
|
||||
|
||||
## What We're Building
|
||||
|
||||
A coding agent that:
|
||||
- Accepts user prompts via a terminal UI
|
||||
- Uses Claude (via the Anthropic SDK) as the LLM
|
||||
- Executes all file and shell operations inside a microsandbox microVM
|
||||
- Exposes those operations via MCP so the tool layer is swappable
|
||||
- Can later be served over HTTP for remote/web access without rewriting core logic
|
||||
|
||||
---
|
||||
|
||||
## Project File Structure
|
||||
|
||||
```
|
||||
coding-agent/
|
||||
│
|
||||
├── pyproject.toml # UV project manifest — dependencies, scripts, tool config
|
||||
├── .python-version # Pins Python version for UV
|
||||
├── .env.example # Template for required env vars (copy to .env)
|
||||
├── README.md
|
||||
│
|
||||
├── agent/ # Core agent logic — no UI concerns here
|
||||
│ ├── __init__.py
|
||||
│ ├── loop.py # The agentic loop: send message → get response → handle tool calls → repeat
|
||||
│ ├── tools.py # Tool definitions (schemas Claude sees) and dispatch table
|
||||
│ ├── history.py # Conversation history management
|
||||
│ └── config.py # Settings loaded from env vars (API keys, model name, safedir path)
|
||||
│
|
||||
├── sandbox/ # All microsandbox interaction lives here
|
||||
│ ├── __init__.py
|
||||
│ ├── session.py # Creates/destroys the sandbox session, exposes run(), holds lifecycle
|
||||
│ └── mcp_client.py # Connects to microsandbox's MCP server, wraps tool calls
|
||||
│
|
||||
├── tools/ # Individual tool implementations — each calls sandbox/mcp_client.py
|
||||
│ ├── __init__.py
|
||||
│ ├── bash.py # run_bash(command) → str
|
||||
│ ├── read.py # read_file(path) → str
|
||||
│ ├── write.py # write_file(path, content) → str
|
||||
│ ├── list_dir.py # list_dir(path) → str
|
||||
│ └── search.py # search_files(pattern) → str
|
||||
│
|
||||
├── ui/
|
||||
│ ├── __init__.py
|
||||
│ ├── tui/
|
||||
│ │ ├── __init__.py
|
||||
│ │ └── app.py # Textual app — renders chat, captures input, calls agent/loop.py
|
||||
│ └── web/ # Stubbed out — implemented later
|
||||
│ └── __init__.py # Placeholder — see Web UI section below
|
||||
│
|
||||
├── tests/
|
||||
│ ├── conftest.py # Shared pytest fixtures (mock sandbox session, sample history, etc.)
|
||||
│ ├── test_loop.py # Unit tests for agentic loop logic
|
||||
│ ├── test_tools.py # Unit tests for each tool (mock the sandbox)
|
||||
│ ├── test_history.py # Tests for conversation history management
|
||||
│ └── test_sandbox.py # Integration tests for sandbox session (require msb server running)
|
||||
│
|
||||
└── scripts/
|
||||
└── start_sandbox_server.sh # Convenience: runs `msb server start --dev`
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dependency Overview
|
||||
|
||||
Add these in `pyproject.toml` under `[project.dependencies]`:
|
||||
|
||||
| Package | Purpose | Docs |
|
||||
|---|---|---|
|
||||
| `anthropic` | Anthropic SDK — LLM calls and MCP client support | https://docs.anthropic.com |
|
||||
| `microsandbox` | Python SDK for microsandbox VM sessions | https://github.com/zerocore-ai/microsandbox |
|
||||
| `textual` | TUI framework — the terminal interface | https://textual.textualize.io |
|
||||
| `python-dotenv` | Load `.env` file into environment | https://pypi.org/project/python-dotenv |
|
||||
| `pydantic` | Settings validation and tool schema modeling | https://docs.pydantic.dev |
|
||||
|
||||
Dev dependencies (`[project.optional-dependencies] dev`):
|
||||
|
||||
| Package | Purpose |
|
||||
|---|---|
|
||||
| `pytest` | Test runner |
|
||||
| `pytest-asyncio` | Async test support (needed — most code is async) |
|
||||
| `pytest-mock` | Mocking sandbox calls in unit tests |
|
||||
|
||||
### UV Quickstart
|
||||
|
||||
```bash
|
||||
# Install UV if not already installed
|
||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
|
||||
# Create project
|
||||
uv init coding-agent
|
||||
cd coding-agent
|
||||
|
||||
# Add dependencies
|
||||
uv add anthropic microsandbox textual python-dotenv pydantic
|
||||
uv add --dev pytest pytest-asyncio pytest-mock
|
||||
|
||||
# Run the TUI
|
||||
uv run python -m ui.tui.app
|
||||
|
||||
# Run tests
|
||||
uv run pytest
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architecture: Concerns and Boundaries
|
||||
|
||||
The most important rule: **each layer only talks to the layer directly below it.**
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────┐
|
||||
│ UI Layer (ui/) │ Renders output, captures input.
|
||||
│ Textual TUI | Web (later) │ No LLM calls. No sandbox calls.
|
||||
└──────────────┬──────────────────────┘
|
||||
│ calls
|
||||
┌──────────────▼──────────────────────┐
|
||||
│ Agent Layer (agent/) │ Owns the loop. Talks to Anthropic API.
|
||||
│ loop.py · tools.py · history.py │ Decides which tools to call.
|
||||
└──────────────┬──────────────────────┘
|
||||
│ calls
|
||||
┌──────────────▼──────────────────────┐
|
||||
│ Tools Layer (tools/) │ One file per tool. Pure functions.
|
||||
│ bash · read · write · list · grep │ No LLM knowledge. No UI knowledge.
|
||||
└──────────────┬──────────────────────┘
|
||||
│ calls
|
||||
┌──────────────▼──────────────────────┐
|
||||
│ Sandbox Layer (sandbox/) │ Owns the VM session and MCP connection.
|
||||
│ session.py · mcp_client.py │ Everything executes in here.
|
||||
└─────────────────────────────────────┘
|
||||
│
|
||||
microVM (isolated)
|
||||
safedir mounted in
|
||||
```
|
||||
|
||||
**Why this matters**: When you swap the TUI for a web UI, you only touch `ui/`. When you swap microsandbox for a different execution backend, you only touch `sandbox/`. The agent loop doesn't change.
|
||||
|
||||
---
|
||||
|
||||
## Key Implementation Notes
|
||||
|
||||
### 1. The Agentic Loop (`agent/loop.py`)
|
||||
|
||||
This is the heart of the project. The pattern is:
|
||||
|
||||
1. Add user message to history
|
||||
2. Send full history to Claude
|
||||
3. If response contains tool calls → execute them → add results to history → go to 2
|
||||
4. If response is plain text → return it to the UI
|
||||
|
||||
```python
|
||||
# Rough shape of loop.py
|
||||
async def run_turn(user_message: str, history: list, sandbox) -> str:
|
||||
history.append({"role": "user", "content": user_message})
|
||||
|
||||
while True:
|
||||
response = await call_claude(history)
|
||||
|
||||
if response.stop_reason == "end_turn":
|
||||
return response.text
|
||||
|
||||
if response.stop_reason == "tool_use":
|
||||
tool_results = await execute_tools(response.tool_calls, sandbox)
|
||||
history.append({"role": "assistant", "content": response.content})
|
||||
history.append({"role": "user", "content": tool_results})
|
||||
# loop continues
|
||||
```
|
||||
|
||||
Reference: https://docs.anthropic.com/en/docs/build-with-claude/tool-use
|
||||
|
||||
### 2. Tool Definitions (`agent/tools.py`)
|
||||
|
||||
Claude needs two things for tools: a JSON schema describing each tool, and a dispatch function that routes tool calls to the right implementation.
|
||||
|
||||
```python
|
||||
# tools.py exports two things:
|
||||
TOOL_SCHEMAS = [
|
||||
{
|
||||
"name": "bash",
|
||||
"description": "Run a shell command in the sandbox",
|
||||
"input_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"command": {"type": "string", "description": "The shell command to run"}
|
||||
},
|
||||
"required": ["command"]
|
||||
}
|
||||
},
|
||||
# ... one entry per tool
|
||||
]
|
||||
|
||||
async def dispatch(tool_name: str, tool_input: dict, sandbox) -> str:
|
||||
# routes to tools/bash.py, tools/read.py etc.
|
||||
```
|
||||
|
||||
### 3. Sandbox Session (`sandbox/session.py`)
|
||||
|
||||
The sandbox session should be created once at agent startup and reused for the entire conversation. This preserves state between tool calls (installed packages, created files, env vars).
|
||||
|
||||
```python
|
||||
# sandbox/session.py
|
||||
from microsandbox import PythonSandbox
|
||||
|
||||
class SandboxSession:
|
||||
async def __aenter__(self):
|
||||
self._sb = await PythonSandbox.create(name="coding-agent")
|
||||
return self
|
||||
|
||||
async def run(self, command: str) -> str:
|
||||
exec = await self._sb.run(command)
|
||||
return await exec.output()
|
||||
|
||||
async def __aexit__(self, *args):
|
||||
await self._sb.stop()
|
||||
```
|
||||
|
||||
Reference: https://github.com/zerocore-ai/microsandbox/blob/main/sdk/README.md
|
||||
|
||||
### 4. MCP vs Direct SDK
|
||||
|
||||
microsandbox supports two integration patterns:
|
||||
|
||||
- **Direct SDK** (`PythonSandbox.create()`) — simpler, Python-native, recommended to start with
|
||||
- **MCP server** — microsandbox exposes an MCP server; the Anthropic SDK can connect to it directly, and tool definitions come from the server automatically
|
||||
|
||||
Start with the direct SDK (`sandbox/session.py`). The `sandbox/mcp_client.py` file is stubbed for later when you want to switch to the MCP path. The MCP approach reduces boilerplate but adds a moving part.
|
||||
|
||||
MCP reference: https://github.com/zerocore-ai/microsandbox/blob/main/MCP.md
|
||||
Anthropic MCP docs: https://docs.anthropic.com/en/docs/build-with-claude/mcp
|
||||
|
||||
### 5. The TUI (`ui/tui/app.py`)
|
||||
|
||||
Use **Textual** for the TUI. It's async-native which fits well since the agent loop is async.
|
||||
|
||||
A minimal Textual app has:
|
||||
- A `RichLog` or `Markdown` widget for displaying conversation
|
||||
- An `Input` widget for capturing user messages
|
||||
- An `on_input_submitted` handler that calls `agent/loop.py` and appends the result
|
||||
|
||||
Reference: https://textual.textualize.io/guide/
|
||||
|
||||
### 6. Configuration (`agent/config.py`)
|
||||
|
||||
Use `pydantic-settings` to load from `.env`:
|
||||
|
||||
```python
|
||||
from pydantic_settings import BaseSettings
|
||||
|
||||
class Settings(BaseSettings):
|
||||
anthropic_api_key: str
|
||||
model: str = "claude-sonnet-4-5-20250929"
|
||||
safedir: str = "./workspace"
|
||||
max_tokens: int = 8096
|
||||
|
||||
class Config:
|
||||
env_file = ".env"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Copy `.env.example` to `.env` and fill in:
|
||||
|
||||
```
|
||||
ANTHROPIC_API_KEY=sk-ant-...
|
||||
MODEL=claude-sonnet-4-5-20250929
|
||||
SAFEDIR=./workspace
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
**Unit tests** (no sandbox required — mock everything):
|
||||
- `test_loop.py` — mock Claude responses, verify tool calls are dispatched correctly
|
||||
- `test_tools.py` — mock `SandboxSession.run()`, verify each tool formats input/output correctly
|
||||
- `test_history.py` — verify history trimming, message formatting
|
||||
|
||||
**Integration tests** (require `msb server start --dev`):
|
||||
- `test_sandbox.py` — actually runs commands in a VM, verifies output
|
||||
- Mark these with `@pytest.mark.integration` and skip by default:
|
||||
|
||||
```python
|
||||
# conftest.py
|
||||
def pytest_addoption(parser):
|
||||
parser.addoption("--integration", action="store_true")
|
||||
|
||||
def pytest_collection_modifyitems(config, items):
|
||||
if not config.getoption("--integration"):
|
||||
skip = pytest.mark.skip(reason="pass --integration to run")
|
||||
for item in items:
|
||||
if "integration" in item.keywords:
|
||||
item.add_marker(skip)
|
||||
```
|
||||
|
||||
Run integration tests: `uv run pytest --integration`
|
||||
|
||||
---
|
||||
|
||||
## Web UI — Future Path (No Node Required Yet)
|
||||
|
||||
When ready to add a web UI, the approach that avoids Node:
|
||||
|
||||
1. Add **FastAPI** + **uvicorn** to dependencies
|
||||
2. Create `ui/web/app.py` — a FastAPI app with a `/chat` endpoint that calls `agent/loop.py`
|
||||
3. Use **Server-Sent Events (SSE)** for streaming responses
|
||||
4. Serve a minimal HTML/CSS frontend as a static file from FastAPI
|
||||
|
||||
The agent layer doesn't change at all. You're just adding a second entry point alongside the TUI.
|
||||
|
||||
When the project is mature enough to warrant a proper frontend, that's the point to introduce a JS framework. Until then, FastAPI + plain HTML gets you remote access without the Node toolchain.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites Before Writing Code
|
||||
|
||||
1. Install microsandbox: `curl -sSL https://get.microsandbox.dev | sh`
|
||||
2. Start the server: `msb server start --dev`
|
||||
3. Pull the Python image: `msb pull microsandbox/python`
|
||||
4. Set your `ANTHROPIC_API_KEY` in `.env`
|
||||
|
||||
---
|
||||
|
||||
## Suggested Build Order
|
||||
|
||||
1. `agent/config.py` — settings first, everything imports this
|
||||
2. `sandbox/session.py` — get a VM running and verify you can execute commands
|
||||
3. `tools/bash.py` + `tools/read.py` — minimal tool set to prove the loop works
|
||||
4. `agent/tools.py` — schemas and dispatch for those two tools
|
||||
5. `agent/history.py` — simple list wrapper to start
|
||||
6. `agent/loop.py` — wire it all together, test in a plain Python script first
|
||||
7. `ui/tui/app.py` — put a Textual face on the working loop
|
||||
8. Remaining tools (`write`, `list_dir`, `search`)
|
||||
9. Tests throughout — write them alongside each module, not at the end
|
||||
Reference in New Issue
Block a user