Skip to content

MCP server

Firebox ships an MCP stdio server that exposes the daemon's surface as 60+ tools any MCP-aware client can call. Drop one config block into Claude Desktop / Cursor / Claude Code / ChatGPT Agent / any other MCP host and your model gets native sandbox + browser + OS-input + shell + ports + stream tools.

Install

pip install 'firebox[mcp] @ git+ssh://git@github.com/LovroK23/firebox.git'

The [mcp] extra pulls in the mcp Python SDK. Without it the server entry point is missing.

Configure your MCP host

~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or platform equivalent:

{
  "mcpServers": {
    "firebox": {
      "command": "python3",
      "args": ["-m", "firebox.mcp.server"],
      "env": {
        "FIREBOX_URL": "https://firebox.example.com",
        "FIREBOX_TOKEN": "<paste-secret-here>"
      }
    }
  }
}

Restart Claude Desktop. The firebox tools appear in the model's toolbox automatically — no per-conversation setup.

Settings → MCP → New Server:

{
  "name": "firebox",
  "command": "python3 -m firebox.mcp.server",
  "env": {
    "FIREBOX_URL": "https://firebox.example.com",
    "FIREBOX_TOKEN": "<secret>"
  }
}

Same pattern: set command + args + env so the spawned process inherits FIREBOX_URL + FIREBOX_TOKEN. The MCP server reads these on startup and uses them for every tool call.

Tool catalog

Group Tools
Lifecycle sandbox_open, sandbox_list, sandbox_close, sandbox_run
Files file_read, file_write
Browser — perception browser_view (markdown + indexed elements + screenshot, with diff support), browser_extract_markdown, browser_find_elements, browser_search_page, browser_clickables, browser_screenshot, browser_screenshot_annotated, browser_text, browser_text_all, browser_html
Browser — action browser_click_idx, browser_input_idx, browser_select_option_idx, browser_move_mouse, browser_scroll, browser_scroll_in_element, browser_send_keys, browser_dropdown_options, browser_save_pdf, browser_upload_file
Browser — selector / coords / JS browser_click, browser_click_at, browser_fill, browser_press, browser_wait_for, browser_evaluate
Browser — lifecycle browser_start, browser_navigate, browser_restart, browser_close
Browser — tabs browser_tabs_list, browser_tabs_new, browser_tabs_switch, browser_tabs_close
Browser — observability browser_console_view, browser_network_log, browser_network_clear
OS-level input (xdotool) os_click, os_double_click, os_right_click, os_move_mouse, os_drag, os_scroll, os_type, os_key, os_screenshot, os_state
Shells (multi-session) shell_exec, shell_view, shell_write_to_process, shell_wait, shell_kill_process, shell_list
Stream / live preview stream_start, stream_stop, stream_get_url
Public ports sandbox_expose_port, sandbox_list_ports, sandbox_unexpose_port
Search search
Captcha captcha_recaptcha_open_image, captcha_recaptcha_click_cells, captcha_recaptcha_verify, captcha_solve_recaptcha_audio, captcha_solve_hcaptcha_audio, captcha_handoff_to_vnc

The headline tool is browser_view — call it every loop iteration to perceive the page (markdown + indexed elements + annotated PNG), then act via browser_click_idx / browser_input_idx / browser_select_option_idx. Pass the prior view_token as since_view_token to get diff-only payloads (5–10× token savings).

How a model uses it

A typical Claude conversation flows like:

sequenceDiagram
    participant U as User
    participant M as Claude Desktop
    participant S as MCP firebox server
    participant D as Daemon

    U->>M: "Open hbs.si and tell me the headline."
    M->>S: tools/call sandbox_open {template: "browser-use"}
    S->>D: POST /sandboxes
    D-->>S: { id, ip }
    S-->>M: { sandbox_id }
    M->>S: tools/call browser_start {sandbox_id}
    S->>D: ... → in-VM agent
    M->>S: tools/call browser_navigate {url: "https://hbs.si"}
    M->>S: tools/call browser_text {selector: "h1"}
    S-->>M: { text: "Hermes: Digitalna transformacija" }
    M-->>U: "Hermes: Digitalna transformacija"
    M->>S: tools/call sandbox_close {sandbox_id}

The model decides which tools to call based on the user's instruction; the MCP server forwards each call to the daemon over HTTP. No model runs inside the sandbox.

Verifying it works

From a terminal, drive the MCP server manually:

echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":
       {"protocolVersion":"2024-11-05","capabilities":{},
        "clientInfo":{"name":"smoke","version":"0"}}}' | \
    python3 -m firebox.mcp.server

The server replies with its capabilities + tool list. Claude Desktop does the same handshake on its side.