Files
secure-agent/STATUS.md
T
2026-03-04 14:36:42 -07:00

9.4 KiB

Coding Agent - Project Status

Last Updated: 2026-02-21

🎯 Current State: MVP Complete

The agent is functional and can write, read, and execute code in an isolated sandbox.


What Works

Core Infrastructure

  • Sandbox: Podman + libkrun (microVM isolation)

    • Network disabled
    • Workspace mounted at /workspace
    • Bind mount to ./workspace on host
    • 512MB memory limit
  • Agent Loop: Streaming responses with tool visibility

    • Shows tool calls as they happen (🔧 Running: bash(...))
    • Streams text token-by-token
    • Handles tool → response → tool chains
  • Persistent History: JSON files in ./history/

    • Format: YYYYMMDD-HHMMSS.json
    • Includes timestamps
    • Auto-saves after each message

Tools Available

  1. bash - Execute shell commands in sandbox
  2. read_file - Read file contents from workspace
  3. write_file - Write/create files in workspace (creates parent dirs)

Testing

  • Unit tests: 13 passing
  • Integration tests: 5 passing
  • All critical paths covered

⚠️ Known Issues

1. Output Corruption Bug

Symptom: ls -la output shows total 40 with leading spaces/corrupt bytes

Status: Investigated, not yet fixed

Workaround: Output is still readable despite corruption

Notes:

  • demux=False is set but not working as expected
  • May be Podman SDK version issue or multiplex header stripping problem
  • Planned to let agent debug itself once fully operational

2. Multi-line Paste

Symptom: Pasting multi-line text causes each line to be processed as separate input

Status: Known limitation of CLI input()

Workaround: Don't paste multi-line prompts until TUI is built

Solution: Build Textual TUI (planned)


🚀 Quick Start

Prerequisites

# Podman with krun runtime installed
# Python 3.14
# uv package manager

Run the Agent

# Create workspace
mkdir -p workspace

# Set up .env
cat > .env << EOF
ANTHROPIC_API_KEY=sk-ant-...
MODEL=claude-sonnet-4-5-20250929
MAX_TOKENS=8096
SAFEDIR=./workspace
USE_SANDBOX=true
EOF

# Start agent
uv run python main.py

Commands

  • /quit - Exit session
  • /clear - Clear conversation history
  • /help - Show available commands

📂 Project Structure

coding-agent/
├── main.py                  # Entry point
├── agent/
│   ├── config.py           # Settings (Pydantic)
│   ├── loop.py             # run_turn, run_session
│   ├── history.py          # ConversationHistory
│   └── tools.py            # TOOL_SCHEMAS, dispatch_tool
├── sandbox/
│   └── session.py          # PodmanSandbox
├── tools/
│   ├── bash.py             # bash tool
│   └── files.py            # read_file, write_file
├── tests/
│   ├── conftest.py         # Shared fixtures
│   ├── test_config.py
│   ├── test_loop.py
│   ├── test_sandbox.py
│   └── test_files.py
├── workspace/              # Agent's workspace (gitignored)
└── history/                # Session history (gitignored)

🔧 Development

Run Tests

# All tests
uv run pytest

# Unit tests only (fast, no sandbox)
uv run pytest -m unit

# Integration tests (requires sandbox)
uv run pytest -m integration

# Specific file
uv run pytest tests/test_files.py -v

Add a New Tool

  1. Create tool implementation:
# tools/my_tool.py
import asyncio

async def my_tool(param: str, sandbox=None) -> str:
    """Tool description."""
    if sandbox is None:
        return "Error: No sandbox available"
    
    try:
        result = await asyncio.to_thread(sandbox.run, f"some command {param}")
        return result
    except Exception as e:
        return f"Error: {e}"

# Tool schema
MY_TOOL_SCHEMA = {
    "name": "my_tool",
    "description": "What this tool does",
    "input_schema": {
        "type": "object",
        "properties": {
            "param": {
                "type": "string",
                "description": "Parameter description"
            }
        },
        "required": ["param"]
    }
}
  1. Export from tools/__init__.py:
from tools.my_tool import my_tool, MY_TOOL_SCHEMA

TOOL_SCHEMAS = [
    BASH_SCHEMA,
    READ_FILE_SCHEMA,
    WRITE_FILE_SCHEMA,
    MY_TOOL_SCHEMA,  # Add here
]
  1. Add to dispatcher:
# agent/tools.py
from tools import my_tool

async def dispatch_tool(tool_name: str, tool_input: dict, sandbox=None):
    # ...
    elif tool_name == "my_tool":
        return await my_tool(tool_input["param"], sandbox=sandbox)
  1. Write tests:
# tests/test_my_tool.py
@pytest.mark.unit
async def test_my_tool_no_sandbox():
    result = await my_tool("test", sandbox=None)
    assert "error" in result.lower()

@pytest.mark.integration
async def test_my_tool_works():
    async with PodmanSandbox() as sb:
        result = await my_tool("test", sb)
        assert "expected output" in result

🎯 Next Steps (Priority Order)

Immediate (Make Agent More Useful)

  1. Fix output corruption bug

    • Let agent debug itself with current tools
    • Or investigate Podman SDK version/settings
  2. Add more file tools (optional enhancements)

    • list_files(directory) - better than bash("ls")
    • search_files(pattern) - grep with nice output
    • edit_file(filepath, old, new) - targeted edits

Short Term (Better UX)

  1. Session resume

    • Add /load <session-id> command
    • ~10 minutes of work
  2. Build Textual TUI

    • Multi-line input support
    • Better history viewing
    • Collapsible tool output
    • ~3-4 hours

Medium Term (Collaboration Features)

  1. Git integration (host-side tools)

    • git_clone(repo) - uses your SSH keys
    • git_push(branch) - uses your credentials
    • create_pr(title, body) - uses GitHub/Gitea API
    • Agent works on feature branches, you review PRs
    • ~2-3 hours
  2. Improve error messages

    • Better tool error reporting
    • Exit codes visible to agent
    • ~1 hour

Long Term (Advanced Features)

  1. Web API interface

    • FastAPI + SSE for streaming
    • Multi-user support (separate sandboxes)
    • ~4-6 hours
  2. Custom base image

    • Pre-install common packages
    • Faster startup
    • ~1-2 hours
  3. Tool call optimization

    • Batch related operations
    • Cache frequent commands
    • ~2-3 hours

🧪 Testing the Agent

Simple Task

You: Create a Python script that prints "Hello, World!"

Expected: Agent writes file, shows content, runs it, shows output.

Medium Task

You: Create a Flask API with /health endpoint that returns {"status": "ok"}
     Include requirements.txt

Expected: Agent writes app.py, requirements.txt, installs flask, tests the endpoint.

Complex Task

You: Create a data processing script that:
     1. Reads a CSV file
     2. Filters rows where value > 100
     3. Saves to new CSV
     
     Include sample data and tests

Expected: Agent writes script, creates sample data, writes tests, runs everything.


📝 Notes

Why Podman + krun?

  • VM-level isolation (not just containers)
  • Daemonless (no background service)
  • Rootless by default
  • Docker-compatible API
  • Fast startup (~125ms)

Why Not Docker?

  • Container isolation only (not VM)
  • Requires daemon
  • Podman is drop-in replacement with better security

Why Not microsandbox?

  • Promising but immature (SDK version mismatches)
  • Podman + krun uses same underlying tech (libkrun)
  • More stable ecosystem
  • Can revisit microsandbox in 6-12 months

Sandbox Security Model

  • Network disabled - agent can't exfiltrate data
  • Workspace mount - only way to persist files
  • Ephemeral VM - destroyed after session
  • Host git - credentials never in sandbox
  • Agent works on feature branches, you review PRs

Design Decisions

  • Streaming vs batched - Streaming for better UX
  • One tool per file - Clear organization, easy to find
  • Schemas with tools - Keep related code together
  • Keyword args for sandbox - More maintainable
  • JSON history - Human-readable, git-friendly
  • Async throughout - Future-proof for web API

🤝 Contributing (Future)

When ready to open-source:

  1. Add proper README
  2. Add LICENSE (MIT recommended)
  3. Add CONTRIBUTING.md
  4. Set up CI/CD (GitHub Actions)
  5. Add pre-commit hooks
  6. Document MCP integration path

📚 Key Learnings

What Worked Well

  • Layered architecture - Easy to add features on top
  • Testing from the start - Caught issues early
  • Simple tools first - bash/read/write covers 90% of needs
  • Integration tests - More valuable than complex unit tests

What Was Hard

  • Async/sync boundaries - asyncio.to_thread for podman SDK
  • Streaming API - Required rewriting entire request flow
  • Mock complexity - Some unit tests not worth the effort
  • Version mismatches - microsandbox SDK vs server

Surprises

  • Podman multiplex headers - Unexpected output corruption
  • Multi-line paste - CLI input() limitation
  • Test refactoring - Changing streaming broke all tests
  • Path validation - More edge cases than expected


This is a working MVP. The agent can write, read, and execute code safely. Everything else is enhancement.