Skip to content

Tools showcase

Every capability the agent (yours, or an MCP host like Claude Desktop) can use, with a 5-line example. Python SDK on the left, the matching MCP tool name on the right. Click the section title to jump.

Section Use it for
Run a command Synchronous shell exec
Stream output Long-running jobs, live tail
Background processes Servers that outlive a single call
Read & write files Inject code/data, fetch results
Upload & download Move files between local and VM
Open the browser Stealth Chromium
Click, fill, type Selector-based interaction
Visual click Coordinate-based, screenshot-driven
Read page content Text / DOM / attributes
Screenshots Plain or annotated
Run JavaScript Escape hatch
Persist sessions Cookies / localStorage save/load
Real-Chrome HTTP Bypass TLS-fingerprint blocks
Aggregated search SearxNG fan-out
Transcribe audio Whisper inside the VM
Solve captcha 2captcha API or local Whisper
Expose a public URL DNAT host port to a sandbox port

Run a command

Synchronous shell exec inside the sandbox. Returns stdout, stderr, exit code, duration.

r = sb.run("uname -a; df -h /")
print(r.stdout)            # captured
print(r.exit_code)         # 0
sandbox_run { sandbox_id, cmd, timeout?, cwd? }
→ { stdout, stderr, exit_code }

Stream output

Same as run but yields chunks as they arrive — perfect for progress bars, tail -f, long compiles.

for c in sb.stream("for i in 1 2 3; do echo $i; sleep 1; done"):
    if c.stream == "stdout": print(c.data, end="")
    elif c.stream == "final": print(f"\nexit {c.exit_code}")
firebox sandbox stream <id> "long-running-cmd"

NDJSON over the wire; the SDK turns each line into a StreamChunk.


Background processes

Start a server that outlives the call that started it; query it from later calls in the same sandbox.

p = sb.process.start("python3 -m http.server 8000 --bind 0.0.0.0")
sb.run("curl -s http://127.0.0.1:8000/")    # talks to the bg server
p.kill()

Process registry endpoints: start / list / kill / wait / logs. Every process gets its stdout / stderr captured to a tempfile so logs() is re-readable.


Read & write files

Binary-safe (base64 on the wire). Use to inject Python scripts, config, datasets; fetch results back.

sb.files.write("/work/main.py", "print('hi')")
sb.files.write("/data/blob.bin", b"\x00\x01\x02", mode=0o600)

text = sb.files.read_text("/etc/firebox-template-name")
blob = sb.files.read("/work/output.parquet")    # bytes
listing = sb.files.list("/work")                # [FileEntry, ...]
file_write  { sandbox_id, path, content }   → { bytes_written }
file_read   { sandbox_id, path }            → { content }

Upload & download

Convenience wrappers around read / write for whole files.

sb.files.upload("local.csv",  "/data/in.csv")
sb.files.download("/work/out.png", "local-out.png")

Mode (executable bit, etc.) carried across.


Open the browser

Headless stealth Chromium driven from outside. One call to start, state persists.

sb.browser.start()                                  # stealth on by default
sb.browser.navigate("https://example.com")
print(sb.browser.text("h1"))                        # → "Example Domain"
browser_start    { sandbox_id, headless?, stealth?, user_agent?, ... }
browser_navigate { sandbox_id, url, wait_until?, timeout? }
browser_close    { sandbox_id }

Template browser-use has Chromium + Playwright + patchright + curl_cffi + Whisper baked in.


Click, fill, type

Selector-based interaction.

sb.browser.click("button.submit")
sb.browser.fill("input[name='email']", "alice@example.com")
sb.browser.press("Enter")
sb.browser.type("manual typing", humanlike=True)    # 60-100 WPM jitter
sb.browser.wait_for("div.results", state="visible")
browser_click       { sandbox_id, selector }
browser_fill        { sandbox_id, selector, text }
browser_press       { sandbox_id, key }
browser_wait_for    { sandbox_id, selector, state? }

Visual click

Click by viewport coordinates instead of selector. Pair with screenshot_annotated + clickables so the agent sees numbered boxes.

items = sb.browser.clickables()                  # idx, x, y, text, href
img   = sb.browser.screenshot_annotated()        # PNG with yellow numbers
# ... LLM looks at img + items, picks idx 5 ...
sb.browser.click_at(items[5]["x"], items[5]["y"], humanlike=True)
browser_clickables           → list of { idx, x, y, text, href, ... }
browser_screenshot_annotated → ImageContent (PNG with overlays)
browser_click_at             { sandbox_id, x, y }

humanlike=True (default) curves the cursor along a Bezier path with per-step jitter; False does an instant teleport.


Read page content

title    = sb.browser.text("h1")                       # one element
titles   = sb.browser.text_all(".titleline > a")       # list[str]
href     = sb.browser.attr("a.cta", "href")            # attribute
full_html = sb.browser.html()                          # whole page
browser_text     { sandbox_id, selector? }   → { text }
browser_text_all { sandbox_id, selector }    → { items: [...] }
browser_html     { sandbox_id, selector? }   → { html }

Screenshots

sb.browser.screenshot(save_path="page.png")            # plain
sb.browser.screenshot(selector="header", save_path="h.png")
sb.browser.screenshot(full_page=True, save_path="full.png")

# With numbered overlays for LLM-friendly visual click:
sb.browser.screenshot_annotated(save_path="annotated.png")
browser_screenshot          → ImageContent (PNG)
browser_screenshot_annotated→ ImageContent (PNG, yellow numbers)

ImageContent is what Claude / GPT-4o see directly in the conversation.


Run JavaScript

Escape hatch for anything Playwright can't express cleanly.

titles = sb.browser.evaluate("""
    () => [...document.querySelectorAll('.titleline > a')]
              .map(a => a.innerText).slice(0, 5)
""")
browser_evaluate { sandbox_id, script }   → { result }

The script must return JSON-serializable data.


Persist sessions

Save a logged-in browser context and restore in a future sandbox.

# First sandbox: log in once.
sb.browser.start()
sb.browser.navigate("https://app.example.com/login")
sb.browser.fill("#email", "alice@example.com")
sb.browser.fill("#password", "...")
sb.browser.click("button[type=submit]")
sb.browser.save_profile("alice-app")                 # /var/firebox-profiles/alice-app.json

# Days later, fresh sandbox: come back already logged in.
with Sandbox.create(template="browser-use") as sb2:
    sb2.browser.start(profile="alice-app")
    sb2.browser.navigate("https://app.example.com/dashboard")

Profiles include cookies + localStorage + sessionStorage. They're template-scoped (saved from browser-use only loads in browser-use).


Real-Chrome HTTP

When Cloudflare or Akamai block the chromium browser at the TLS layer, fire raw HTTP with curl_cffi's real Chrome 120 JA3/JA4 fingerprint.

r = sb.http.get("https://api.cloudflared-site.com/v1/data",
                headers={"Accept": "application/json"})
print(r.status, r.json())
r = sb.http.get("https://tls.peet.ws/api/all").json()
print(r["tls"]["ja3_hash"])    # matches real Chrome

Methods: get / post / put / delete / request. Cookies from the browser carry over via headers.


Self-hosted SearxNG fans out to 5–15 engines per query. No API key, no rate limit.

sb.search.web("rust web framework")              # general
sb.search.news("AI agents", time_range="day")    # last day
sb.search.papers("microvm performance")          # arxiv + Scholar
sb.search.code("playwright stealth github")      # GitHub + SO + Arch
sb.search.images("hacker news logo")
sb.search.videos("rust async tutorial")
sb.search.wiki("Firecracker (software)")
sb.search.maps("Ljubljana")
search { sandbox_id, q, categories?, engines?, language?,
         time_range?, pageno?, safesearch?, limit? }
firebox search "calorie tracker" -c news -t week -n 5
firebox search "..." -u | xargs -P5 curl -sI -o /dev/null

Cached in-sandbox for 5 minutes; same query twice is microseconds.


Transcribe audio

Local Whisper (tiny.en, 39 MB) baked into the browser-use template. ~5× realtime on the sandbox's CPU.

audio = sb.http.get("https://example.com/podcast.mp3").content
result = sb.audio.transcribe(audio, format="mp3")
print(result.text)            # "..."
print(result.language)        # "en"
print(result.segments)        # [{start, end, text}, ...]

Language can be auto-detected or pinned with language="en".


Solve captcha

Four paths, picking by cost / target site.

Free, ~70-80 % per-attempt, works on sites that allow audio mode. Same shape for reCAPTCHA v2 and hCaptcha.

if sb.browser.detect_captcha():
    out = sb.captcha.solve_recaptcha_audio(retries=3)
    out = sb.captcha.solve_hcaptcha_audio(retries=3)
    # → { verified, attempts, text, language }

For sites that force image challenges. Firebox surfaces the puzzle as plain data (instructions + screenshot + cell bboxes). Your agent's vision LLM decides which cells; firebox runs no model.

challenge = sb.captcha.recaptcha_open_image_challenge()
# challenge.screenshot_b64   PNG of the grid
# challenge.instructions     "Click verify once there are no more"
# challenge.target           "crosswalks"
# challenge.cells            [{idx, x, y, width, height}, ...]

# ... your LLM looks at the screenshot + instructions ...
# ... decides indices = [0, 4, 7] ...
sb.captcha.recaptcha_click_cells([0, 4, 7])
result = sb.captcha.recaptcha_verify_image()
# If reCAPTCHA wants more clicks, result has the next puzzle baked in:
while result.get("more_to_click"):
    # hand result back to the LLM, repeat
    ...

Works for v2, v3, hCaptcha, Cloudflare Turnstile, Funcaptcha. Caller doesn't see the puzzle.

info = sb.browser.detect_captcha()
if info:
    sb.browser.solve_captcha_on_page(api_key="<2captcha-key>")

Last resort: ping the human, expose VNC, let them solve manually.

info = sb.captcha.handoff_to_vnc(poll_until_solved=True,
                                  poll_timeout=300)
# info = {vnc_url, vnc_password, sandbox_ip,
#         solved: True, waited: 23.4}

The caller is responsible for routing port 5900 from the sandbox to the human (DNAT, Tailscale, whatever your topology requires). poll_until_solved=True blocks until detect_captcha() returns None.

detect_captcha returns {type, sitekey, iframe_url, callback} for reCAPTCHA v2/v3/enterprise, hCaptcha, Cloudflare Turnstile, Funcaptcha.


Expose a public URL

DNAT a host port to a port the sandbox is listening on. Return a URL the public can hit.

firebox run --template browser-use --expose 8000 \
    "python3 -m http.server 8000 --bind 0.0.0.0"
# → prints http://your-host:RANDOM_PORT
# The VM keeps running until Ctrl+C.

The SDK doesn't have a one-liner — use firebox.vmm.run_exposed or set up the iptables rules yourself before launching the sandbox. See examples/browser-use/run_agent_vnc.py for a worked example (DNAT 5900 → in-VM x11vnc).

Cleanup happens automatically when the sandbox closes — the iptables rules trap on exit.


What's not yet here

These would be obvious additions; nothing's blocking them, just not written yet:

  • PTY / interactive shellfirebox sandbox attach <id> for a real terminal. The SDK and daemon would need TTY allocation + a bidirectional WebSocket.
  • GPU passthrough — pass a Blackwell partition through to the VM for in-sandbox inference. Firecracker supports VFIO; we don't wire it up yet.
  • Pause / resume snapshots — Firecracker has native snapshot support; we'd boot from a paused snapshot for sub-50 ms cold start.
  • Volume mounts — persistent state across sandbox lifetimes for the same template / token.

If any of these are blocking your use case, the host setup docs tell you where each piece would slot in.