385 lines
9.4 KiB
Markdown
385 lines
9.4 KiB
Markdown
# Coding Agent - Project Status
|
|
|
|
**Last Updated:** 2026-02-21
|
|
|
|
## 🎯 Current State: MVP Complete
|
|
|
|
The agent is functional and can write, read, and execute code in an isolated sandbox.
|
|
|
|
---
|
|
|
|
## ✅ What Works
|
|
|
|
### Core Infrastructure
|
|
- **Sandbox**: Podman + libkrun (microVM isolation)
|
|
- Network disabled
|
|
- Workspace mounted at `/workspace`
|
|
- Bind mount to `./workspace` on host
|
|
- 512MB memory limit
|
|
|
|
- **Agent Loop**: Streaming responses with tool visibility
|
|
- Shows tool calls as they happen (`🔧 Running: bash(...)`)
|
|
- Streams text token-by-token
|
|
- Handles tool → response → tool chains
|
|
|
|
- **Persistent History**: JSON files in `./history/`
|
|
- Format: `YYYYMMDD-HHMMSS.json`
|
|
- Includes timestamps
|
|
- Auto-saves after each message
|
|
|
|
### Tools Available
|
|
1. **`bash`** - Execute shell commands in sandbox
|
|
2. **`read_file`** - Read file contents from workspace
|
|
3. **`write_file`** - Write/create files in workspace (creates parent dirs)
|
|
|
|
### Testing
|
|
- Unit tests: 13 passing
|
|
- Integration tests: 5 passing
|
|
- All critical paths covered
|
|
|
|
---
|
|
|
|
## ⚠️ Known Issues
|
|
|
|
### 1. Output Corruption Bug
|
|
**Symptom:** `ls -la` output shows `�total 40` with leading spaces/corrupt bytes
|
|
|
|
**Status:** Investigated, not yet fixed
|
|
|
|
**Workaround:** Output is still readable despite corruption
|
|
|
|
**Notes:**
|
|
- `demux=False` is set but not working as expected
|
|
- May be Podman SDK version issue or multiplex header stripping problem
|
|
- Planned to let agent debug itself once fully operational
|
|
|
|
### 2. Multi-line Paste
|
|
**Symptom:** Pasting multi-line text causes each line to be processed as separate input
|
|
|
|
**Status:** Known limitation of CLI `input()`
|
|
|
|
**Workaround:** Don't paste multi-line prompts until TUI is built
|
|
|
|
**Solution:** Build Textual TUI (planned)
|
|
|
|
---
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### Prerequisites
|
|
```bash
|
|
# Podman with krun runtime installed
|
|
# Python 3.14
|
|
# uv package manager
|
|
```
|
|
|
|
### Run the Agent
|
|
```bash
|
|
# Create workspace
|
|
mkdir -p workspace
|
|
|
|
# Set up .env
|
|
cat > .env << EOF
|
|
ANTHROPIC_API_KEY=sk-ant-...
|
|
MODEL=claude-sonnet-4-5-20250929
|
|
MAX_TOKENS=8096
|
|
SAFEDIR=./workspace
|
|
USE_SANDBOX=true
|
|
EOF
|
|
|
|
# Start agent
|
|
uv run python main.py
|
|
```
|
|
|
|
### Commands
|
|
- `/quit` - Exit session
|
|
- `/clear` - Clear conversation history
|
|
- `/help` - Show available commands
|
|
|
|
---
|
|
|
|
## 📂 Project Structure
|
|
```
|
|
coding-agent/
|
|
├── main.py # Entry point
|
|
├── agent/
|
|
│ ├── config.py # Settings (Pydantic)
|
|
│ ├── loop.py # run_turn, run_session
|
|
│ ├── history.py # ConversationHistory
|
|
│ └── tools.py # TOOL_SCHEMAS, dispatch_tool
|
|
├── sandbox/
|
|
│ └── session.py # PodmanSandbox
|
|
├── tools/
|
|
│ ├── bash.py # bash tool
|
|
│ └── files.py # read_file, write_file
|
|
├── tests/
|
|
│ ├── conftest.py # Shared fixtures
|
|
│ ├── test_config.py
|
|
│ ├── test_loop.py
|
|
│ ├── test_sandbox.py
|
|
│ └── test_files.py
|
|
├── workspace/ # Agent's workspace (gitignored)
|
|
└── history/ # Session history (gitignored)
|
|
```
|
|
|
|
---
|
|
|
|
## 🔧 Development
|
|
|
|
### Run Tests
|
|
```bash
|
|
# All tests
|
|
uv run pytest
|
|
|
|
# Unit tests only (fast, no sandbox)
|
|
uv run pytest -m unit
|
|
|
|
# Integration tests (requires sandbox)
|
|
uv run pytest -m integration
|
|
|
|
# Specific file
|
|
uv run pytest tests/test_files.py -v
|
|
```
|
|
|
|
### Add a New Tool
|
|
|
|
1. **Create tool implementation:**
|
|
```python
|
|
# tools/my_tool.py
|
|
import asyncio
|
|
|
|
async def my_tool(param: str, sandbox=None) -> str:
|
|
"""Tool description."""
|
|
if sandbox is None:
|
|
return "Error: No sandbox available"
|
|
|
|
try:
|
|
result = await asyncio.to_thread(sandbox.run, f"some command {param}")
|
|
return result
|
|
except Exception as e:
|
|
return f"Error: {e}"
|
|
|
|
# Tool schema
|
|
MY_TOOL_SCHEMA = {
|
|
"name": "my_tool",
|
|
"description": "What this tool does",
|
|
"input_schema": {
|
|
"type": "object",
|
|
"properties": {
|
|
"param": {
|
|
"type": "string",
|
|
"description": "Parameter description"
|
|
}
|
|
},
|
|
"required": ["param"]
|
|
}
|
|
}
|
|
```
|
|
|
|
2. **Export from `tools/__init__.py`:**
|
|
```python
|
|
from tools.my_tool import my_tool, MY_TOOL_SCHEMA
|
|
|
|
TOOL_SCHEMAS = [
|
|
BASH_SCHEMA,
|
|
READ_FILE_SCHEMA,
|
|
WRITE_FILE_SCHEMA,
|
|
MY_TOOL_SCHEMA, # Add here
|
|
]
|
|
```
|
|
|
|
3. **Add to dispatcher:**
|
|
```python
|
|
# agent/tools.py
|
|
from tools import my_tool
|
|
|
|
async def dispatch_tool(tool_name: str, tool_input: dict, sandbox=None):
|
|
# ...
|
|
elif tool_name == "my_tool":
|
|
return await my_tool(tool_input["param"], sandbox=sandbox)
|
|
```
|
|
|
|
4. **Write tests:**
|
|
```python
|
|
# tests/test_my_tool.py
|
|
@pytest.mark.unit
|
|
async def test_my_tool_no_sandbox():
|
|
result = await my_tool("test", sandbox=None)
|
|
assert "error" in result.lower()
|
|
|
|
@pytest.mark.integration
|
|
async def test_my_tool_works():
|
|
async with PodmanSandbox() as sb:
|
|
result = await my_tool("test", sb)
|
|
assert "expected output" in result
|
|
```
|
|
|
|
---
|
|
|
|
## 🎯 Next Steps (Priority Order)
|
|
|
|
### Immediate (Make Agent More Useful)
|
|
1. **Fix output corruption bug**
|
|
- Let agent debug itself with current tools
|
|
- Or investigate Podman SDK version/settings
|
|
|
|
2. **Add more file tools** (optional enhancements)
|
|
- `list_files(directory)` - better than `bash("ls")`
|
|
- `search_files(pattern)` - grep with nice output
|
|
- `edit_file(filepath, old, new)` - targeted edits
|
|
|
|
### Short Term (Better UX)
|
|
3. **Session resume**
|
|
- Add `/load <session-id>` command
|
|
- ~10 minutes of work
|
|
|
|
4. **Build Textual TUI**
|
|
- Multi-line input support
|
|
- Better history viewing
|
|
- Collapsible tool output
|
|
- ~3-4 hours
|
|
|
|
### Medium Term (Collaboration Features)
|
|
5. **Git integration (host-side tools)**
|
|
- `git_clone(repo)` - uses your SSH keys
|
|
- `git_push(branch)` - uses your credentials
|
|
- `create_pr(title, body)` - uses GitHub/Gitea API
|
|
- Agent works on feature branches, you review PRs
|
|
- ~2-3 hours
|
|
|
|
6. **Improve error messages**
|
|
- Better tool error reporting
|
|
- Exit codes visible to agent
|
|
- ~1 hour
|
|
|
|
### Long Term (Advanced Features)
|
|
7. **Web API interface**
|
|
- FastAPI + SSE for streaming
|
|
- Multi-user support (separate sandboxes)
|
|
- ~4-6 hours
|
|
|
|
8. **Custom base image**
|
|
- Pre-install common packages
|
|
- Faster startup
|
|
- ~1-2 hours
|
|
|
|
9. **Tool call optimization**
|
|
- Batch related operations
|
|
- Cache frequent commands
|
|
- ~2-3 hours
|
|
|
|
---
|
|
|
|
## 🧪 Testing the Agent
|
|
|
|
### Simple Task
|
|
```
|
|
You: Create a Python script that prints "Hello, World!"
|
|
```
|
|
|
|
Expected: Agent writes file, shows content, runs it, shows output.
|
|
|
|
### Medium Task
|
|
```
|
|
You: Create a Flask API with /health endpoint that returns {"status": "ok"}
|
|
Include requirements.txt
|
|
```
|
|
|
|
Expected: Agent writes app.py, requirements.txt, installs flask, tests the endpoint.
|
|
|
|
### Complex Task
|
|
```
|
|
You: Create a data processing script that:
|
|
1. Reads a CSV file
|
|
2. Filters rows where value > 100
|
|
3. Saves to new CSV
|
|
|
|
Include sample data and tests
|
|
```
|
|
|
|
Expected: Agent writes script, creates sample data, writes tests, runs everything.
|
|
|
|
---
|
|
|
|
## 📝 Notes
|
|
|
|
### Why Podman + krun?
|
|
- **VM-level isolation** (not just containers)
|
|
- **Daemonless** (no background service)
|
|
- **Rootless** by default
|
|
- **Docker-compatible** API
|
|
- Fast startup (~125ms)
|
|
|
|
### Why Not Docker?
|
|
- Container isolation only (not VM)
|
|
- Requires daemon
|
|
- Podman is drop-in replacement with better security
|
|
|
|
### Why Not microsandbox?
|
|
- Promising but immature (SDK version mismatches)
|
|
- Podman + krun uses same underlying tech (libkrun)
|
|
- More stable ecosystem
|
|
- Can revisit microsandbox in 6-12 months
|
|
|
|
### Sandbox Security Model
|
|
- **Network disabled** - agent can't exfiltrate data
|
|
- **Workspace mount** - only way to persist files
|
|
- **Ephemeral VM** - destroyed after session
|
|
- **Host git** - credentials never in sandbox
|
|
- Agent works on feature branches, you review PRs
|
|
|
|
### Design Decisions
|
|
- **Streaming vs batched** - Streaming for better UX
|
|
- **One tool per file** - Clear organization, easy to find
|
|
- **Schemas with tools** - Keep related code together
|
|
- **Keyword args for sandbox** - More maintainable
|
|
- **JSON history** - Human-readable, git-friendly
|
|
- **Async throughout** - Future-proof for web API
|
|
|
|
---
|
|
|
|
## 🤝 Contributing (Future)
|
|
|
|
When ready to open-source:
|
|
1. Add proper README
|
|
2. Add LICENSE (MIT recommended)
|
|
3. Add CONTRIBUTING.md
|
|
4. Set up CI/CD (GitHub Actions)
|
|
5. Add pre-commit hooks
|
|
6. Document MCP integration path
|
|
|
|
---
|
|
|
|
## 📚 Key Learnings
|
|
|
|
### What Worked Well
|
|
- **Layered architecture** - Easy to add features on top
|
|
- **Testing from the start** - Caught issues early
|
|
- **Simple tools first** - bash/read/write covers 90% of needs
|
|
- **Integration tests** - More valuable than complex unit tests
|
|
|
|
### What Was Hard
|
|
- **Async/sync boundaries** - `asyncio.to_thread` for podman SDK
|
|
- **Streaming API** - Required rewriting entire request flow
|
|
- **Mock complexity** - Some unit tests not worth the effort
|
|
- **Version mismatches** - microsandbox SDK vs server
|
|
|
|
### Surprises
|
|
- **Podman multiplex headers** - Unexpected output corruption
|
|
- **Multi-line paste** - CLI input() limitation
|
|
- **Test refactoring** - Changing streaming broke all tests
|
|
- **Path validation** - More edge cases than expected
|
|
|
|
---
|
|
|
|
## 🔗 Useful Links
|
|
|
|
- **Anthropic API Docs**: https://docs.anthropic.com
|
|
- **Podman Python SDK**: https://podman-py.readthedocs.io
|
|
- **Textual TUI**: https://textual.textualize.io
|
|
- **Pydantic**: https://docs.pydantic.dev
|
|
|
|
---
|
|
|
|
*This is a working MVP. The agent can write, read, and execute code safely. Everything else is enhancement.*
|