MCP server¶

Firebox ships an MCP stdio server that exposes the daemon's surface as 60+ tools any MCP-aware client can call. Drop one config block into Claude Desktop / Cursor / Claude Code / ChatGPT Agent / any other MCP host and your model gets native sandbox + browser + OS-input + shell + ports + stream tools.

Install¶

pip install 'firebox[mcp] @ git+ssh://git@github.com/LovroK23/firebox.git'

The [mcp] extra pulls in the mcp Python SDK. Without it the server entry point is missing.

Configure your MCP host¶

Claude Desktop / CodeCursorCodex / ChatGPT

~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or platform equivalent:

{
  "mcpServers": {
    "firebox": {
      "command": "python3",
      "args": ["-m", "firebox.mcp.server"],
      "env": {
        "FIREBOX_URL": "https://firebox.example.com",
        "FIREBOX_TOKEN": "<paste-secret-here>"
      }
    }
  }
}

Restart Claude Desktop. The firebox tools appear in the model's toolbox automatically — no per-conversation setup.

Settings → MCP → New Server:

{
  "name": "firebox",
  "command": "python3 -m firebox.mcp.server",
  "env": {
    "FIREBOX_URL": "https://firebox.example.com",
    "FIREBOX_TOKEN": "<secret>"
  }
}

Same pattern: set command + args + env so the spawned process inherits FIREBOX_URL + FIREBOX_TOKEN. The MCP server reads these on startup and uses them for every tool call.

Tool catalog¶

Group	Tools
Lifecycle	`sandbox_open`, `sandbox_list`, `sandbox_close`, `sandbox_run`
Files	`file_read`, `file_write`
Browser — perception	`browser_view` (markdown + indexed elements + screenshot, with diff support), `browser_extract_markdown`, `browser_find_elements`, `browser_search_page`, `browser_clickables`, `browser_screenshot`, `browser_screenshot_annotated`, `browser_text`, `browser_text_all`, `browser_html`
Browser — action	`browser_click_idx`, `browser_input_idx`, `browser_select_option_idx`, `browser_move_mouse`, `browser_scroll`, `browser_scroll_in_element`, `browser_send_keys`, `browser_dropdown_options`, `browser_save_pdf`, `browser_upload_file`
Browser — selector / coords / JS	`browser_click`, `browser_click_at`, `browser_fill`, `browser_press`, `browser_wait_for`, `browser_evaluate`
Browser — lifecycle	`browser_start`, `browser_navigate`, `browser_restart`, `browser_close`
Browser — tabs	`browser_tabs_list`, `browser_tabs_new`, `browser_tabs_switch`, `browser_tabs_close`
Browser — observability	`browser_console_view`, `browser_network_log`, `browser_network_clear`
OS-level input (xdotool)	`os_click`, `os_double_click`, `os_right_click`, `os_move_mouse`, `os_drag`, `os_scroll`, `os_type`, `os_key`, `os_screenshot`, `os_state`
Shells (multi-session)	`shell_exec`, `shell_view`, `shell_write_to_process`, `shell_wait`, `shell_kill_process`, `shell_list`
Stream / live preview	`stream_start`, `stream_stop`, `stream_get_url`
Public ports	`sandbox_expose_port`, `sandbox_list_ports`, `sandbox_unexpose_port`
Search	`search`
Captcha	`captcha_recaptcha_open_image`, `captcha_recaptcha_click_cells`, `captcha_recaptcha_verify`, `captcha_solve_recaptcha_audio`, `captcha_solve_hcaptcha_audio`, `captcha_handoff_to_vnc`

The headline tool is browser_view — call it every loop iteration to perceive the page (markdown + indexed elements + annotated PNG), then act via browser_click_idx / browser_input_idx / browser_select_option_idx. Pass the prior view_token as since_view_token to get diff-only payloads (5–10× token savings).

How a model uses it¶

A typical Claude conversation flows like:

sequenceDiagram
    participant U as User
    participant M as Claude Desktop
    participant S as MCP firebox server
    participant D as Daemon

    U->>M: "Open hbs.si and tell me the headline."
    M->>S: tools/call sandbox_open {template: "browser-use"}
    S->>D: POST /sandboxes
    D-->>S: { id, ip }
    S-->>M: { sandbox_id }
    M->>S: tools/call browser_start {sandbox_id}
    S->>D: ... → in-VM agent
    M->>S: tools/call browser_navigate {url: "https://hbs.si"}
    M->>S: tools/call browser_text {selector: "h1"}
    S-->>M: { text: "Hermes: Digitalna transformacija" }
    M-->>U: "Hermes: Digitalna transformacija"
    M->>S: tools/call sandbox_close {sandbox_id}

The model decides which tools to call based on the user's instruction; the MCP server forwards each call to the daemon over HTTP. No model runs inside the sandbox.

Verifying it works¶

From a terminal, drive the MCP server manually:

echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":
       {"protocolVersion":"2024-11-05","capabilities":{},
        "clientInfo":{"name":"smoke","version":"0"}}}' | \
    python3 -m firebox.mcp.server

The server replies with its capabilities + tool list. Claude Desktop does the same handshake on its side.