MCP server¶
Firebox ships an MCP stdio server that exposes the daemon's surface as 60+ tools any MCP-aware client can call. Drop one config block into Claude Desktop / Cursor / Claude Code / ChatGPT Agent / any other MCP host and your model gets native sandbox + browser + OS-input + shell + ports + stream tools.
Install¶
The [mcp] extra pulls in the mcp Python SDK. Without it the
server entry point is missing.
Configure your MCP host¶
~/Library/Application Support/Claude/claude_desktop_config.json
(macOS) or platform equivalent:
{
"mcpServers": {
"firebox": {
"command": "python3",
"args": ["-m", "firebox.mcp.server"],
"env": {
"FIREBOX_URL": "https://firebox.example.com",
"FIREBOX_TOKEN": "<paste-secret-here>"
}
}
}
}
Restart Claude Desktop. The firebox tools appear in the model's toolbox automatically — no per-conversation setup.
Settings → MCP → New Server:
Same pattern: set command + args + env so the spawned process
inherits FIREBOX_URL + FIREBOX_TOKEN. The MCP server reads
these on startup and uses them for every tool call.
Tool catalog¶
| Group | Tools |
|---|---|
| Lifecycle | sandbox_open, sandbox_list, sandbox_close, sandbox_run |
| Files | file_read, file_write |
| Browser — perception | browser_view (markdown + indexed elements + screenshot, with diff support), browser_extract_markdown, browser_find_elements, browser_search_page, browser_clickables, browser_screenshot, browser_screenshot_annotated, browser_text, browser_text_all, browser_html |
| Browser — action | browser_click_idx, browser_input_idx, browser_select_option_idx, browser_move_mouse, browser_scroll, browser_scroll_in_element, browser_send_keys, browser_dropdown_options, browser_save_pdf, browser_upload_file |
| Browser — selector / coords / JS | browser_click, browser_click_at, browser_fill, browser_press, browser_wait_for, browser_evaluate |
| Browser — lifecycle | browser_start, browser_navigate, browser_restart, browser_close |
| Browser — tabs | browser_tabs_list, browser_tabs_new, browser_tabs_switch, browser_tabs_close |
| Browser — observability | browser_console_view, browser_network_log, browser_network_clear |
| OS-level input (xdotool) | os_click, os_double_click, os_right_click, os_move_mouse, os_drag, os_scroll, os_type, os_key, os_screenshot, os_state |
| Shells (multi-session) | shell_exec, shell_view, shell_write_to_process, shell_wait, shell_kill_process, shell_list |
| Stream / live preview | stream_start, stream_stop, stream_get_url |
| Public ports | sandbox_expose_port, sandbox_list_ports, sandbox_unexpose_port |
| Search | search |
| Captcha | captcha_recaptcha_open_image, captcha_recaptcha_click_cells, captcha_recaptcha_verify, captcha_solve_recaptcha_audio, captcha_solve_hcaptcha_audio, captcha_handoff_to_vnc |
The headline tool is browser_view — call it every loop iteration
to perceive the page (markdown + indexed elements + annotated PNG),
then act via browser_click_idx / browser_input_idx /
browser_select_option_idx. Pass the prior view_token as
since_view_token to get diff-only payloads (5–10× token savings).
How a model uses it¶
A typical Claude conversation flows like:
sequenceDiagram
participant U as User
participant M as Claude Desktop
participant S as MCP firebox server
participant D as Daemon
U->>M: "Open hbs.si and tell me the headline."
M->>S: tools/call sandbox_open {template: "browser-use"}
S->>D: POST /sandboxes
D-->>S: { id, ip }
S-->>M: { sandbox_id }
M->>S: tools/call browser_start {sandbox_id}
S->>D: ... → in-VM agent
M->>S: tools/call browser_navigate {url: "https://hbs.si"}
M->>S: tools/call browser_text {selector: "h1"}
S-->>M: { text: "Hermes: Digitalna transformacija" }
M-->>U: "Hermes: Digitalna transformacija"
M->>S: tools/call sandbox_close {sandbox_id}
The model decides which tools to call based on the user's instruction; the MCP server forwards each call to the daemon over HTTP. No model runs inside the sandbox.
Verifying it works¶
From a terminal, drive the MCP server manually:
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":
{"protocolVersion":"2024-11-05","capabilities":{},
"clientInfo":{"name":"smoke","version":"0"}}}' | \
python3 -m firebox.mcp.server
The server replies with its capabilities + tool list. Claude Desktop does the same handshake on its side.