Python SDK¶
Stdlib-only client. pip install brings nothing along — urllib,
json, base64 do the work.
Sandbox is the only entry point you'll actually instantiate. Every
namespace (files, process, shells, browser, os, stream,
ports, http, search, audio, captcha) hangs off an instance.
Sandbox¶
sb = Sandbox.create(
template: str | None = None,
ttl_seconds: float = 300.0,
vcpu: int = 2,
mem_mib: int = 512,
# Optional auto-upload of a local directory at boot:
workspace: str | None = None, # local dir path
workspace_remote: str = "/work", # remote target
workspace_exclude: list[str] | None = None, # fnmatch patterns
)
sb = Sandbox.attach(sandbox_id: str)
sb.id # short id used everywhere
sb.ip # 10.42.0.X
sb.template # template name or None
sb.expires_at # epoch seconds
sb.run(cmd, timeout=60, cwd=None, env=None, shell=None) -> RunResult
sb.stream(cmd, ...) -> Iterator[StreamChunk]
sb.close()
# Context manager (recommended)
with Sandbox.create(...) as sb:
...
RunResult is (stdout, stderr, exit_code, duration, timeout).
StreamChunk.stream is "stdout" / "stderr" / "final".
workspace= tar-gzips the local directory in one HTTP roundtrip and
extracts it inside the sandbox before create() returns — handy for
shipping a project tree to the VM in one shot. See sb.files
for the explicit upload_dir / download_dir and live watch.
sb.files¶
sb.files.read(path) -> bytes
sb.files.read_text(path) -> str
sb.files.write(path, content, mode=None) -> bytes_written
sb.files.list(path="/") -> list[FileEntry] # name, type, size, mtime
sb.files.upload(local, remote, mode=None)
sb.files.download(remote, local)
# Bulk transfer — tar+gzip in one HTTP roundtrip
sb.files.upload_dir(local_dir, remote_dir, exclude=["*.pyc","__pycache__/*"])
sb.files.download_dir(remote_dir, local_dir)
# Live filesystem events — yields {path, event, file} per change
for evt in sb.files.watch("/work", recursive=True,
events="modify,create,delete,move",
timeout=600):
print(evt["event"], evt["file"]) # MODIFY a.txt, CREATE b.py, ...
Bytes are base64-framed — binary files survive the round-trip
unchanged. upload_dir is dramatically faster than looping upload
for many files (one POST instead of N) and accepts fnmatch exclude
patterns. watch runs inotifywait -m inside the sandbox and streams
events back as they occur — perfect for live "agent edited file X"
displays. Requires inotify-tools in the template (baked into
browser-use).
sb.process¶
Background processes that outlive sb.run calls. The classic case:
start a server, then probe it from the next call.
proc = sb.process.start("python3 -m http.server 8000")
sb.run("curl http://127.0.0.1:8000/") # talks to the bg server
proc.logs() -> ProcessLogs(stdout, stderr, running, exit_code)
proc.kill() -> exit_code
proc.wait(timeout=60) -> exit_code or None on timeout
sb.process.list() -> list[Process]
sb.shells¶
Named long-running shells with stdin write — multi-session shell, the
Manus-parity tool surface. Each shell captures stdout/stderr to per-
session files; view supports incremental tailing.
s = sb.shells.start("dev", "cd /work && npm run dev",
cwd=None, env=None, shell="/bin/bash")
# -> ShellSession(name, pid, cmd)
s.view(since_byte=None) # incremental tail (None = since
# last view); since_byte=0 reads all
# -> {name, pid, stdout, stderr, next_byte, running, exit_code}
s.write(text, append_newline=False, close_stdin=False)
s.wait(timeout=60.0)
s.kill()
sb.shells.get(name) # handle for an existing shell
sb.shells.list() # all shells: pid, cmd, running, ...
sb.ports¶
Public-URL port forwarding. The daemon installs DNAT rules from a free
host port to the sandbox VM's IP at vm_port.
ep = sb.ports.expose(8000, scheme="http")
# -> {vm_port, host_port, url, scheme}
sb.ports.list() # active mappings
sb.ports.unexpose(host_port)
The service inside the sandbox must bind 0.0.0.0 (not 127.0.0.1).
Mappings are removed automatically on sandbox close. The public host
in the URL comes from the daemon's FIREBOX_EXPOSE_HOST env var,
falling back to socket.gethostname().
sb.stream¶
Live noVNC stream URL — drop into a browser tab to watch the agent drive Chromium in real time.
info = sb.stream.start(
password=None, # auto-generated if omitted
view_only=False,
require_auth=True,
install_if_missing=True, # apt-install websockify+novnc if absent
)
# -> {url, password, host_port, vm_port}
sb.stream.get_url()
sb.stream.get_password()
sb.stream.stop() # tear down public mapping
Pipeline: firebox-display (Xvfb + x11vnc) → websockify (6080 ⇄ 5900,
embedded noVNC web client) → sb.ports.expose(6080). Requires the
browser-use template (or anything that ships
/usr/local/bin/firebox-display).
sb.os¶
OS-level input via xdotool inside the sandbox. Operates at the X
server level (DISPLAY=:99) — reaches native dialogs, file pickers,
browser chrome, anything outside the page DOM. This is what
Anthropic Computer Use and OpenAI CUA tool schemas target.
sb.os.click(x, y, button=1, click_count=1)
sb.os.double_click(x, y)
sb.os.right_click(x, y)
sb.os.middle_click(x, y)
sb.os.move_mouse(x, y)
sb.os.drag(x1, y1, x2, y2, button=1)
sb.os.scroll("down", amount=3) # up | down | left | right
sb.os.type(text, delay_ms=25)
sb.os.key("ctrl+c") # X keysym; modifiers OK
sb.os.key_down(key) / .key_up(key)
sb.os.screenshot(save_path=None) # full-display PNG, bytes
sb.os.state() # {display, width, height, cursor_x, cursor_y}
sb.os.active_window() # {window_id, name}
Lazy apt-installs xdotool + scrot on first use if missing
(baked into the browser-use template).
sb.browser¶
Stealth Chromium driven from outside. See Browser concepts for the agent loop pattern; the SDK signatures are:
sb.browser.start(headless=False, stealth=True, viewport=None,
user_agent=None, locale=None, timezone_id=None,
profile=None, proxy=None)
# headless=False (default) -> visible via Xvfb (stream/VNC ready).
# headless=True -> full Chromium under --headless=new.
sb.browser.close()
sb.browser.restart(url=None, wait_until="domcontentloaded", timeout=30.0)
sb.browser.state() -> {"open": bool, "url": str, "title": str}
# Navigation
sb.browser.navigate(url, wait_until="domcontentloaded", timeout=30.0,
wait_for_load="networkidle", wait_timeout=5.0)
sb.browser.back() / .forward() / .reload()
# Manus-style perception (the headline primitive)
sb.browser.view(
max_markdown_chars=30000, screenshot=True, annotated=True,
full_page=False,
wait_for_load="networkidle", wait_timeout=5.0,
traverse_iframes=True,
since_view_token=None, delta_only=False,
extract_links=False, extract_images=False,
)
# -> {view_token, url, title, markdown, elements, screenshot_b64,
# viewport, total_chars, returned_chars, truncated,
# diff?: {added_count, removed_count, stable_count,
# added_elements, removed_elements,
# markdown_unchanged, url_changed}}
# Index-based interaction (idx from view().elements)
sb.browser.click_idx(idx, button="left", click_count=1, humanlike=True)
sb.browser.input_idx(idx, text, clear=True, humanlike=True, delay_ms=None)
sb.browser.select_option_idx(idx, value)
sb.browser.move_mouse(x=None, y=None, idx=None, humanlike=True)
sb.browser.scroll_by(direction="down", amount=0.8) # up | down | top | bottom
sb.browser.scroll_in_element(idx, direction="into_view", amount=0.5)
# Selector-based interaction
sb.browser.click(selector, timeout=10.0)
sb.browser.click_at(x, y, button="left", click_count=1, humanlike=True)
sb.browser.fill(selector, text, timeout=10.0)
sb.browser.press(key) # single key
sb.browser.send_keys(keys, delay_ms=25) # chord: 'Control+O'
sb.browser.type(text, delay_ms=None, humanlike=True)
sb.browser.scroll(x=0, y=0)
sb.browser.wait_for(selector, state="visible", timeout=10.0)
# Reading
sb.browser.text(selector=None, timeout=10.0) -> str
sb.browser.text_all(selector, timeout=10.0) -> list[str]
sb.browser.attr(selector, name, timeout=10.0) -> str | None
sb.browser.html(selector=None, timeout=10.0) -> str
sb.browser.evaluate(script, arg=None) -> Any
sb.browser.clickables() -> list[dict]
# Extraction primitives (browser-use parity)
sb.browser.extract_markdown(max_chars=0, start_from_char=0,
extract_links=False, extract_images=False,
traverse_iframes=True)
sb.browser.find_elements(selector, attributes=None, max_results=50,
include_text=True, include_html=False)
sb.browser.search_page(pattern, regex=False, case_sensitive=False,
context_chars=150, max_results=25, css_scope=None)
sb.browser.dropdown_options(idx)
# Screenshots
sb.browser.screenshot(selector=None, full_page=False, save_path=None) -> bytes
sb.browser.screenshot_annotated(full_page=False, save_path=None) -> bytes
# Save / upload
sb.browser.save_pdf(path="/tmp/firebox-page.pdf", paper_format="A4",
landscape=False, print_background=True, scale=1.0)
sb.browser.upload_file(idx, paths) # paths inside the sandbox
# Tabs (4-char ids, browser-use convention)
sb.browser.tabs_list() # {tabs: [{id, url, title, active}], active}
sb.browser.tabs_new(url=None, wait_until="domcontentloaded", timeout=30.0)
sb.browser.tabs_switch(id)
sb.browser.tabs_close(id)
# Console + network capture
sb.browser.console_view(limit=100, clear=False)
sb.browser.console_clear()
sb.browser.network_log(limit=100, clear=False, url_contains=None,
method=None, phase=None, min_status=None)
sb.browser.network_clear()
# Persistence
sb.browser.cookies(urls=None) -> list[dict]
sb.browser.set_cookies(cookies)
sb.browser.save_profile(name) -> str (path)
sb.browser.list_profiles() -> list[str]
sb.browser.delete_profile(name)
# Captcha helpers
sb.browser.detect_captcha() -> dict | None
sb.browser.solve_captcha_on_page(api_key=...) -> dict
sb.browser.inject_captcha_token(captcha_type, token)
sb.http¶
Raw HTTP with real Chrome 120 TLS / JA3 / H2 fingerprint via curl_cffi. Use this for API calls / scraping where Cloudflare's TLS-layer detection rejects the chromium browser.
r = sb.http.get(url, params=None, headers=None, timeout=30.0,
impersonate="chrome120", proxies=None)
r = sb.http.post(url, json={...}, ...)
r = sb.http.request("PUT", url, body=b"...", ...)
r.ok -> bool
r.status -> int
r.url -> str
r.headers -> dict
r.text -> str
r.content -> bytes
r.json() -> Any
sb.search¶
Aggregated metasearch via SearxNG. See Search.
sb.search.web(q, **kw) -> list[SearchResult]
sb.search.news(q, **kw)
sb.search.papers(q, **kw)
sb.search.code(q, **kw)
sb.search.images(q, **kw)
sb.search.videos(q, **kw)
sb.search.wiki(q, **kw)
sb.search.maps(q, **kw)
sb.search.query(q, categories=, engines=, language=, time_range=,
pageno=, safesearch=, base_url=, cache=True, timeout=20)
# Live streaming — yields SearchResult per engine as backends respond
for r in sb.search.stream(q, engines=["google","duckduckgo","brave","qwant"]):
print(r.engine, r.title, r.url)
sb.search.stream queries each engine independently and yields its
results as soon as they arrive, deduped on URL. Use it when you want
to render results in a UI as they trickle in instead of waiting for
the full SearxNG aggregation.
sb.audio¶
Local Whisper inside the sandbox. Powers the free reCAPTCHA audio solver but useful on its own:
result = sb.audio.transcribe(
audio: bytes, format="mp3", model="tiny.en",
language=None, beam_size=5, vad_filter=False,
)
result.text # full text
result.language # "en", probability close to 1
result.duration # seconds
result.segments # [{start, end, text}, ...]
sb.captcha¶
Four solver paths — pick by cost / target site.
# Paid solver service (works for everything)
sb.captcha.solve_recaptcha_v2(sitekey, page_url, api_key=None, provider="2captcha", timeout=180)
sb.captcha.solve_hcaptcha(...)
sb.captcha.solve_turnstile(...)
# Free local Whisper — audio mode of reCAPTCHA v2 / hCaptcha
sb.captcha.solve_recaptcha_audio(retries=3, model="tiny.en") -> dict
sb.captcha.solve_hcaptcha_audio(retries=3, model="tiny.en") -> dict
# Free image-grid primitives — *your* vision LLM picks cells
sb.captcha.recaptcha_open_image_challenge(timeout=30.0)
# → {instructions, target, grid_size, cells, screenshot_b64}
sb.captcha.recaptcha_click_cells(indices, timeout=10.0)
# → {clicked, indices}
sb.captcha.recaptcha_verify_image(timeout=15.0)
# → {verified, more_to_click, ...} or a fresh challenge
# Free human handoff via VNC
sb.captcha.handoff_to_vnc(password="firebox",
poll_until_solved=False, poll_timeout=600.0)
# → {vnc_url, sandbox_ip, vnc_password, solved?, waited?}
The image-grid primitives deliberately don't run a model in firebox. The caller's own LLM (Claude / GPT-4o / Qwen-VL — whatever's already driving the agent) does the visual reasoning, which keeps firebox provider-agnostic.
Error model¶
All SDK methods raise RuntimeError with the daemon's {"error": "..."}
body inlined when the daemon returns 4xx / 5xx. Time-outs raise
TimeoutError.