# Coding Agent - Project Status **Last Updated:** 2026-02-21 ## 🎯 Current State: MVP Complete The agent is functional and can write, read, and execute code in an isolated sandbox. --- ## βœ… What Works ### Core Infrastructure - **Sandbox**: Podman + libkrun (microVM isolation) - Network disabled - Workspace mounted at `/workspace` - Bind mount to `./workspace` on host - 512MB memory limit - **Agent Loop**: Streaming responses with tool visibility - Shows tool calls as they happen (`πŸ”§ Running: bash(...)`) - Streams text token-by-token - Handles tool β†’ response β†’ tool chains - **Persistent History**: JSON files in `./history/` - Format: `YYYYMMDD-HHMMSS.json` - Includes timestamps - Auto-saves after each message ### Tools Available 1. **`bash`** - Execute shell commands in sandbox 2. **`read_file`** - Read file contents from workspace 3. **`write_file`** - Write/create files in workspace (creates parent dirs) ### Testing - Unit tests: 13 passing - Integration tests: 5 passing - All critical paths covered --- ## ⚠️ Known Issues ### 1. Output Corruption Bug **Symptom:** `ls -la` output shows `οΏ½total 40` with leading spaces/corrupt bytes **Status:** Investigated, not yet fixed **Workaround:** Output is still readable despite corruption **Notes:** - `demux=False` is set but not working as expected - May be Podman SDK version issue or multiplex header stripping problem - Planned to let agent debug itself once fully operational ### 2. Multi-line Paste **Symptom:** Pasting multi-line text causes each line to be processed as separate input **Status:** Known limitation of CLI `input()` **Workaround:** Don't paste multi-line prompts until TUI is built **Solution:** Build Textual TUI (planned) --- ## πŸš€ Quick Start ### Prerequisites ```bash # Podman with krun runtime installed # Python 3.14 # uv package manager ``` ### Run the Agent ```bash # Create workspace mkdir -p workspace # Set up .env cat > .env << EOF ANTHROPIC_API_KEY=sk-ant-... MODEL=claude-sonnet-4-5-20250929 MAX_TOKENS=8096 SAFEDIR=./workspace USE_SANDBOX=true EOF # Start agent uv run python main.py ``` ### Commands - `/quit` - Exit session - `/clear` - Clear conversation history - `/help` - Show available commands --- ## πŸ“‚ Project Structure ``` coding-agent/ β”œβ”€β”€ main.py # Entry point β”œβ”€β”€ agent/ β”‚ β”œβ”€β”€ config.py # Settings (Pydantic) β”‚ β”œβ”€β”€ loop.py # run_turn, run_session β”‚ β”œβ”€β”€ history.py # ConversationHistory β”‚ └── tools.py # TOOL_SCHEMAS, dispatch_tool β”œβ”€β”€ sandbox/ β”‚ └── session.py # PodmanSandbox β”œβ”€β”€ tools/ β”‚ β”œβ”€β”€ bash.py # bash tool β”‚ └── files.py # read_file, write_file β”œβ”€β”€ tests/ β”‚ β”œβ”€β”€ conftest.py # Shared fixtures β”‚ β”œβ”€β”€ test_config.py β”‚ β”œβ”€β”€ test_loop.py β”‚ β”œβ”€β”€ test_sandbox.py β”‚ └── test_files.py β”œβ”€β”€ workspace/ # Agent's workspace (gitignored) └── history/ # Session history (gitignored) ``` --- ## πŸ”§ Development ### Run Tests ```bash # All tests uv run pytest # Unit tests only (fast, no sandbox) uv run pytest -m unit # Integration tests (requires sandbox) uv run pytest -m integration # Specific file uv run pytest tests/test_files.py -v ``` ### Add a New Tool 1. **Create tool implementation:** ```python # tools/my_tool.py import asyncio async def my_tool(param: str, sandbox=None) -> str: """Tool description.""" if sandbox is None: return "Error: No sandbox available" try: result = await asyncio.to_thread(sandbox.run, f"some command {param}") return result except Exception as e: return f"Error: {e}" # Tool schema MY_TOOL_SCHEMA = { "name": "my_tool", "description": "What this tool does", "input_schema": { "type": "object", "properties": { "param": { "type": "string", "description": "Parameter description" } }, "required": ["param"] } } ``` 2. **Export from `tools/__init__.py`:** ```python from tools.my_tool import my_tool, MY_TOOL_SCHEMA TOOL_SCHEMAS = [ BASH_SCHEMA, READ_FILE_SCHEMA, WRITE_FILE_SCHEMA, MY_TOOL_SCHEMA, # Add here ] ``` 3. **Add to dispatcher:** ```python # agent/tools.py from tools import my_tool async def dispatch_tool(tool_name: str, tool_input: dict, sandbox=None): # ... elif tool_name == "my_tool": return await my_tool(tool_input["param"], sandbox=sandbox) ``` 4. **Write tests:** ```python # tests/test_my_tool.py @pytest.mark.unit async def test_my_tool_no_sandbox(): result = await my_tool("test", sandbox=None) assert "error" in result.lower() @pytest.mark.integration async def test_my_tool_works(): async with PodmanSandbox() as sb: result = await my_tool("test", sb) assert "expected output" in result ``` --- ## 🎯 Next Steps (Priority Order) ### Immediate (Make Agent More Useful) 1. **Fix output corruption bug** - Let agent debug itself with current tools - Or investigate Podman SDK version/settings 2. **Add more file tools** (optional enhancements) - `list_files(directory)` - better than `bash("ls")` - `search_files(pattern)` - grep with nice output - `edit_file(filepath, old, new)` - targeted edits ### Short Term (Better UX) 3. **Session resume** - Add `/load ` command - ~10 minutes of work 4. **Build Textual TUI** - Multi-line input support - Better history viewing - Collapsible tool output - ~3-4 hours ### Medium Term (Collaboration Features) 5. **Git integration (host-side tools)** - `git_clone(repo)` - uses your SSH keys - `git_push(branch)` - uses your credentials - `create_pr(title, body)` - uses GitHub/Gitea API - Agent works on feature branches, you review PRs - ~2-3 hours 6. **Improve error messages** - Better tool error reporting - Exit codes visible to agent - ~1 hour ### Long Term (Advanced Features) 7. **Web API interface** - FastAPI + SSE for streaming - Multi-user support (separate sandboxes) - ~4-6 hours 8. **Custom base image** - Pre-install common packages - Faster startup - ~1-2 hours 9. **Tool call optimization** - Batch related operations - Cache frequent commands - ~2-3 hours --- ## πŸ§ͺ Testing the Agent ### Simple Task ``` You: Create a Python script that prints "Hello, World!" ``` Expected: Agent writes file, shows content, runs it, shows output. ### Medium Task ``` You: Create a Flask API with /health endpoint that returns {"status": "ok"} Include requirements.txt ``` Expected: Agent writes app.py, requirements.txt, installs flask, tests the endpoint. ### Complex Task ``` You: Create a data processing script that: 1. Reads a CSV file 2. Filters rows where value > 100 3. Saves to new CSV Include sample data and tests ``` Expected: Agent writes script, creates sample data, writes tests, runs everything. --- ## πŸ“ Notes ### Why Podman + krun? - **VM-level isolation** (not just containers) - **Daemonless** (no background service) - **Rootless** by default - **Docker-compatible** API - Fast startup (~125ms) ### Why Not Docker? - Container isolation only (not VM) - Requires daemon - Podman is drop-in replacement with better security ### Why Not microsandbox? - Promising but immature (SDK version mismatches) - Podman + krun uses same underlying tech (libkrun) - More stable ecosystem - Can revisit microsandbox in 6-12 months ### Sandbox Security Model - **Network disabled** - agent can't exfiltrate data - **Workspace mount** - only way to persist files - **Ephemeral VM** - destroyed after session - **Host git** - credentials never in sandbox - Agent works on feature branches, you review PRs ### Design Decisions - **Streaming vs batched** - Streaming for better UX - **One tool per file** - Clear organization, easy to find - **Schemas with tools** - Keep related code together - **Keyword args for sandbox** - More maintainable - **JSON history** - Human-readable, git-friendly - **Async throughout** - Future-proof for web API --- ## 🀝 Contributing (Future) When ready to open-source: 1. Add proper README 2. Add LICENSE (MIT recommended) 3. Add CONTRIBUTING.md 4. Set up CI/CD (GitHub Actions) 5. Add pre-commit hooks 6. Document MCP integration path --- ## πŸ“š Key Learnings ### What Worked Well - **Layered architecture** - Easy to add features on top - **Testing from the start** - Caught issues early - **Simple tools first** - bash/read/write covers 90% of needs - **Integration tests** - More valuable than complex unit tests ### What Was Hard - **Async/sync boundaries** - `asyncio.to_thread` for podman SDK - **Streaming API** - Required rewriting entire request flow - **Mock complexity** - Some unit tests not worth the effort - **Version mismatches** - microsandbox SDK vs server ### Surprises - **Podman multiplex headers** - Unexpected output corruption - **Multi-line paste** - CLI input() limitation - **Test refactoring** - Changing streaming broke all tests - **Path validation** - More edge cases than expected --- ## πŸ”— Useful Links - **Anthropic API Docs**: https://docs.anthropic.com - **Podman Python SDK**: https://podman-py.readthedocs.io - **Textual TUI**: https://textual.textualize.io - **Pydantic**: https://docs.pydantic.dev --- *This is a working MVP. The agent can write, read, and execute code safely. Everything else is enhancement.*