9.4 KiB
Coding Agent - Project Status
Last Updated: 2026-02-21
🎯 Current State: MVP Complete
The agent is functional and can write, read, and execute code in an isolated sandbox.
✅ What Works
Core Infrastructure
-
Sandbox: Podman + libkrun (microVM isolation)
- Network disabled
- Workspace mounted at
/workspace - Bind mount to
./workspaceon host - 512MB memory limit
-
Agent Loop: Streaming responses with tool visibility
- Shows tool calls as they happen (
🔧 Running: bash(...)) - Streams text token-by-token
- Handles tool → response → tool chains
- Shows tool calls as they happen (
-
Persistent History: JSON files in
./history/- Format:
YYYYMMDD-HHMMSS.json - Includes timestamps
- Auto-saves after each message
- Format:
Tools Available
bash- Execute shell commands in sandboxread_file- Read file contents from workspacewrite_file- Write/create files in workspace (creates parent dirs)
Testing
- Unit tests: 13 passing
- Integration tests: 5 passing
- All critical paths covered
⚠️ Known Issues
1. Output Corruption Bug
Symptom: ls -la output shows �total 40 with leading spaces/corrupt bytes
Status: Investigated, not yet fixed
Workaround: Output is still readable despite corruption
Notes:
demux=Falseis set but not working as expected- May be Podman SDK version issue or multiplex header stripping problem
- Planned to let agent debug itself once fully operational
2. Multi-line Paste
Symptom: Pasting multi-line text causes each line to be processed as separate input
Status: Known limitation of CLI input()
Workaround: Don't paste multi-line prompts until TUI is built
Solution: Build Textual TUI (planned)
🚀 Quick Start
Prerequisites
# Podman with krun runtime installed
# Python 3.14
# uv package manager
Run the Agent
# Create workspace
mkdir -p workspace
# Set up .env
cat > .env << EOF
ANTHROPIC_API_KEY=sk-ant-...
MODEL=claude-sonnet-4-5-20250929
MAX_TOKENS=8096
SAFEDIR=./workspace
USE_SANDBOX=true
EOF
# Start agent
uv run python main.py
Commands
/quit- Exit session/clear- Clear conversation history/help- Show available commands
📂 Project Structure
coding-agent/
├── main.py # Entry point
├── agent/
│ ├── config.py # Settings (Pydantic)
│ ├── loop.py # run_turn, run_session
│ ├── history.py # ConversationHistory
│ └── tools.py # TOOL_SCHEMAS, dispatch_tool
├── sandbox/
│ └── session.py # PodmanSandbox
├── tools/
│ ├── bash.py # bash tool
│ └── files.py # read_file, write_file
├── tests/
│ ├── conftest.py # Shared fixtures
│ ├── test_config.py
│ ├── test_loop.py
│ ├── test_sandbox.py
│ └── test_files.py
├── workspace/ # Agent's workspace (gitignored)
└── history/ # Session history (gitignored)
🔧 Development
Run Tests
# All tests
uv run pytest
# Unit tests only (fast, no sandbox)
uv run pytest -m unit
# Integration tests (requires sandbox)
uv run pytest -m integration
# Specific file
uv run pytest tests/test_files.py -v
Add a New Tool
- Create tool implementation:
# tools/my_tool.py
import asyncio
async def my_tool(param: str, sandbox=None) -> str:
"""Tool description."""
if sandbox is None:
return "Error: No sandbox available"
try:
result = await asyncio.to_thread(sandbox.run, f"some command {param}")
return result
except Exception as e:
return f"Error: {e}"
# Tool schema
MY_TOOL_SCHEMA = {
"name": "my_tool",
"description": "What this tool does",
"input_schema": {
"type": "object",
"properties": {
"param": {
"type": "string",
"description": "Parameter description"
}
},
"required": ["param"]
}
}
- Export from
tools/__init__.py:
from tools.my_tool import my_tool, MY_TOOL_SCHEMA
TOOL_SCHEMAS = [
BASH_SCHEMA,
READ_FILE_SCHEMA,
WRITE_FILE_SCHEMA,
MY_TOOL_SCHEMA, # Add here
]
- Add to dispatcher:
# agent/tools.py
from tools import my_tool
async def dispatch_tool(tool_name: str, tool_input: dict, sandbox=None):
# ...
elif tool_name == "my_tool":
return await my_tool(tool_input["param"], sandbox=sandbox)
- Write tests:
# tests/test_my_tool.py
@pytest.mark.unit
async def test_my_tool_no_sandbox():
result = await my_tool("test", sandbox=None)
assert "error" in result.lower()
@pytest.mark.integration
async def test_my_tool_works():
async with PodmanSandbox() as sb:
result = await my_tool("test", sb)
assert "expected output" in result
🎯 Next Steps (Priority Order)
Immediate (Make Agent More Useful)
-
Fix output corruption bug
- Let agent debug itself with current tools
- Or investigate Podman SDK version/settings
-
Add more file tools (optional enhancements)
list_files(directory)- better thanbash("ls")search_files(pattern)- grep with nice outputedit_file(filepath, old, new)- targeted edits
Short Term (Better UX)
-
Session resume
- Add
/load <session-id>command - ~10 minutes of work
- Add
-
Build Textual TUI
- Multi-line input support
- Better history viewing
- Collapsible tool output
- ~3-4 hours
Medium Term (Collaboration Features)
-
Git integration (host-side tools)
git_clone(repo)- uses your SSH keysgit_push(branch)- uses your credentialscreate_pr(title, body)- uses GitHub/Gitea API- Agent works on feature branches, you review PRs
- ~2-3 hours
-
Improve error messages
- Better tool error reporting
- Exit codes visible to agent
- ~1 hour
Long Term (Advanced Features)
-
Web API interface
- FastAPI + SSE for streaming
- Multi-user support (separate sandboxes)
- ~4-6 hours
-
Custom base image
- Pre-install common packages
- Faster startup
- ~1-2 hours
-
Tool call optimization
- Batch related operations
- Cache frequent commands
- ~2-3 hours
🧪 Testing the Agent
Simple Task
You: Create a Python script that prints "Hello, World!"
Expected: Agent writes file, shows content, runs it, shows output.
Medium Task
You: Create a Flask API with /health endpoint that returns {"status": "ok"}
Include requirements.txt
Expected: Agent writes app.py, requirements.txt, installs flask, tests the endpoint.
Complex Task
You: Create a data processing script that:
1. Reads a CSV file
2. Filters rows where value > 100
3. Saves to new CSV
Include sample data and tests
Expected: Agent writes script, creates sample data, writes tests, runs everything.
📝 Notes
Why Podman + krun?
- VM-level isolation (not just containers)
- Daemonless (no background service)
- Rootless by default
- Docker-compatible API
- Fast startup (~125ms)
Why Not Docker?
- Container isolation only (not VM)
- Requires daemon
- Podman is drop-in replacement with better security
Why Not microsandbox?
- Promising but immature (SDK version mismatches)
- Podman + krun uses same underlying tech (libkrun)
- More stable ecosystem
- Can revisit microsandbox in 6-12 months
Sandbox Security Model
- Network disabled - agent can't exfiltrate data
- Workspace mount - only way to persist files
- Ephemeral VM - destroyed after session
- Host git - credentials never in sandbox
- Agent works on feature branches, you review PRs
Design Decisions
- Streaming vs batched - Streaming for better UX
- One tool per file - Clear organization, easy to find
- Schemas with tools - Keep related code together
- Keyword args for sandbox - More maintainable
- JSON history - Human-readable, git-friendly
- Async throughout - Future-proof for web API
🤝 Contributing (Future)
When ready to open-source:
- Add proper README
- Add LICENSE (MIT recommended)
- Add CONTRIBUTING.md
- Set up CI/CD (GitHub Actions)
- Add pre-commit hooks
- Document MCP integration path
📚 Key Learnings
What Worked Well
- Layered architecture - Easy to add features on top
- Testing from the start - Caught issues early
- Simple tools first - bash/read/write covers 90% of needs
- Integration tests - More valuable than complex unit tests
What Was Hard
- Async/sync boundaries -
asyncio.to_threadfor podman SDK - Streaming API - Required rewriting entire request flow
- Mock complexity - Some unit tests not worth the effort
- Version mismatches - microsandbox SDK vs server
Surprises
- Podman multiplex headers - Unexpected output corruption
- Multi-line paste - CLI input() limitation
- Test refactoring - Changing streaming broke all tests
- Path validation - More edge cases than expected
🔗 Useful Links
- Anthropic API Docs: https://docs.anthropic.com
- Podman Python SDK: https://podman-py.readthedocs.io
- Textual TUI: https://textual.textualize.io
- Pydantic: https://docs.pydantic.dev
This is a working MVP. The agent can write, read, and execute code safely. Everything else is enhancement.