Python SDK¶
Stdlib-only client. pip install brings nothing along — urllib,
json, base64 do the work.
Sandbox is the only entry point you'll actually instantiate. Every
namespace (files, process, browser, http, search, audio,
captcha) hangs off an instance.
Sandbox¶
sb = Sandbox.create(
template: str | None = None,
ttl_seconds: float = 300.0,
vcpu: int = 2,
mem_mib: int = 512,
# Optional auto-upload of a local directory at boot:
workspace: str | None = None, # local dir path
workspace_remote: str = "/work", # remote target
workspace_exclude: list[str] | None = None, # fnmatch patterns
)
sb = Sandbox.attach(sandbox_id: str)
sb.id # short id used everywhere
sb.ip # 10.42.0.X
sb.template # template name or None
sb.expires_at # epoch seconds
sb.run(cmd, timeout=60, cwd=None, env=None, shell=None) -> RunResult
sb.stream(cmd, ...) -> Iterator[StreamChunk]
sb.close()
# Context manager (recommended)
with Sandbox.create(...) as sb:
...
RunResult is (stdout, stderr, exit_code, duration, timeout).
StreamChunk.stream is "stdout" / "stderr" / "final".
workspace= tar-gzips the local directory in one HTTP roundtrip and
extracts it inside the sandbox before create() returns — handy for
shipping a project tree to the VM in one shot. See sb.files
for the explicit upload_dir / download_dir and live watch.
sb.files¶
sb.files.read(path) -> bytes
sb.files.read_text(path) -> str
sb.files.write(path, content, mode=None) -> bytes_written
sb.files.list(path="/") -> list[FileEntry] # name, type, size, mtime
sb.files.upload(local, remote, mode=None)
sb.files.download(remote, local)
# Bulk transfer — tar+gzip in one HTTP roundtrip
sb.files.upload_dir(local_dir, remote_dir, exclude=["*.pyc","__pycache__/*"])
sb.files.download_dir(remote_dir, local_dir)
# Live filesystem events — yields {path, event, file} per change
for evt in sb.files.watch("/work", recursive=True,
events="modify,create,delete,move",
timeout=600):
print(evt["event"], evt["file"]) # MODIFY a.txt, CREATE b.py, ...
Bytes are base64-framed — binary files survive the round-trip
unchanged. upload_dir is dramatically faster than looping upload
for many files (one POST instead of N) and accepts fnmatch exclude
patterns. watch runs inotifywait -m inside the sandbox and streams
events back as they occur — perfect for live "agent edited file X"
displays. Requires inotify-tools in the template (baked into
browser-use).
sb.process¶
Background processes that outlive sb.run calls. The classic case:
start a server, then probe it from the next call.
proc = sb.process.start("python3 -m http.server 8000")
sb.run("curl http://127.0.0.1:8000/") # talks to the bg server
proc.logs() -> ProcessLogs(stdout, stderr, running, exit_code)
proc.kill() -> exit_code
proc.wait(timeout=60) -> exit_code or None on timeout
sb.process.list() -> list[Process]
sb.browser¶
Stealth Chromium driven from outside. See Browser concepts for the full surface; the SDK signatures are:
sb.browser.start(headless=True, stealth=True, viewport=None,
user_agent=None, locale=None, timezone_id=None,
profile=None, proxy=None)
sb.browser.close()
sb.browser.state() -> {"open": bool, "url": str, "title": str}
# Navigation
sb.browser.navigate(url, wait_until="domcontentloaded", timeout=30.0)
sb.browser.back() / .forward() / .reload()
# Interaction
sb.browser.click(selector, timeout=10.0)
sb.browser.click_at(x, y, button="left", click_count=1, humanlike=True)
sb.browser.fill(selector, text, timeout=10.0)
sb.browser.press(key)
sb.browser.type(text, delay_ms=None, humanlike=True)
sb.browser.scroll(x=0, y=0)
sb.browser.wait_for(selector, state="visible", timeout=10.0)
# Reading
sb.browser.text(selector=None, timeout=10.0) -> str
sb.browser.text_all(selector, timeout=10.0) -> list[str]
sb.browser.attr(selector, name, timeout=10.0) -> str | None
sb.browser.html(selector=None, timeout=10.0) -> str
sb.browser.evaluate(script, arg=None) -> Any
sb.browser.clickables() -> list[dict]
# Screenshots
sb.browser.screenshot(selector=None, full_page=False, save_path=None) -> bytes
sb.browser.screenshot_annotated(full_page=False, save_path=None) -> bytes
# Persistence
sb.browser.cookies(urls=None) -> list[dict]
sb.browser.set_cookies(cookies)
sb.browser.save_profile(name) -> str (path)
sb.browser.list_profiles() -> list[str]
sb.browser.delete_profile(name)
# Captcha helpers
sb.browser.detect_captcha() -> dict | None
sb.browser.solve_captcha_on_page(api_key=...) -> dict
sb.browser.inject_captcha_token(captcha_type, token)
sb.http¶
Raw HTTP with real Chrome 120 TLS / JA3 / H2 fingerprint via curl_cffi. Use this for API calls / scraping where Cloudflare's TLS-layer detection rejects the chromium browser.
r = sb.http.get(url, params=None, headers=None, timeout=30.0,
impersonate="chrome120", proxies=None)
r = sb.http.post(url, json={...}, ...)
r = sb.http.request("PUT", url, body=b"...", ...)
r.ok -> bool
r.status -> int
r.url -> str
r.headers -> dict
r.text -> str
r.content -> bytes
r.json() -> Any
sb.search¶
Aggregated metasearch via SearxNG. See Search.
sb.search.web(q, **kw) -> list[SearchResult]
sb.search.news(q, **kw)
sb.search.papers(q, **kw)
sb.search.code(q, **kw)
sb.search.images(q, **kw)
sb.search.videos(q, **kw)
sb.search.wiki(q, **kw)
sb.search.maps(q, **kw)
sb.search.query(q, categories=, engines=, language=, time_range=,
pageno=, safesearch=, base_url=, cache=True, timeout=20)
# Live streaming — yields SearchResult per engine as backends respond
for r in sb.search.stream(q, engines=["google","duckduckgo","brave","qwant"]):
print(r.engine, r.title, r.url)
sb.search.stream queries each engine independently and yields its
results as soon as they arrive, deduped on URL. Use it when you want
to render results in a UI as they trickle in instead of waiting for
the full SearxNG aggregation.
sb.audio¶
Local Whisper inside the sandbox. Powers the free reCAPTCHA audio solver but useful on its own:
result = sb.audio.transcribe(
audio: bytes, format="mp3", model="tiny.en",
language=None, beam_size=5, vad_filter=False,
)
result.text # full text
result.language # "en", probability close to 1
result.duration # seconds
result.segments # [{start, end, text}, ...]
sb.captcha¶
Four solver paths — pick by cost / target site.
# Paid solver service (works for everything)
sb.captcha.solve_recaptcha_v2(sitekey, page_url, api_key=None, provider="2captcha", timeout=180)
sb.captcha.solve_hcaptcha(...)
sb.captcha.solve_turnstile(...)
# Free local Whisper — audio mode of reCAPTCHA v2 / hCaptcha
sb.captcha.solve_recaptcha_audio(retries=3, model="tiny.en") -> dict
sb.captcha.solve_hcaptcha_audio(retries=3, model="tiny.en") -> dict
# Free image-grid primitives — *your* vision LLM picks cells
sb.captcha.recaptcha_open_image_challenge(timeout=30.0)
# → {instructions, target, grid_size, cells, screenshot_b64}
sb.captcha.recaptcha_click_cells(indices, timeout=10.0)
# → {clicked, indices}
sb.captcha.recaptcha_verify_image(timeout=15.0)
# → {verified, more_to_click, ...} or a fresh challenge
# Free human handoff via VNC
sb.captcha.handoff_to_vnc(password="firebox",
poll_until_solved=False, poll_timeout=600.0)
# → {vnc_url, sandbox_ip, vnc_password, solved?, waited?}
The image-grid primitives deliberately don't run a model in firebox. The caller's own LLM (Claude / GPT-4o / Qwen-VL — whatever's already driving the agent) does the visual reasoning, which keeps firebox provider-agnostic.
Error model¶
All SDK methods raise RuntimeError with the daemon's {"error": "..."}
body inlined when the daemon returns 4xx / 5xx. Time-outs raise
TimeoutError.