Skip to content

Browser

sb.browser is a Playwright-driven Chromium running inside the sandbox. It's stealthed by default (passes bot.sannysoft.com fingerprint checks), threads its calls through a per-VM worker, and keeps state across calls — same tab, same cookies — until you close().

Drive it from outside

with Sandbox.create(template="browser-use") as sb:
    b = sb.browser
    b.start()                                  # launch Chromium
    b.navigate("https://example.com")
    print(b.text("h1"))                        # → "Example Domain"

The browser stays up across calls. There's no LLM inside the sandbox — the caller is the agent.

What's exposed

flowchart TB
    subgraph SDK["sb.browser API"]
        N1[start / close / state]
        N2[navigate / back / forward / reload]
        N3[click / click_at / fill / press / type / scroll]
        N4[wait_for]
        N5[text / text_all / attr / html]
        N6[screenshot / screenshot_annotated / clickables]
        N7[evaluate]
        N8[cookies / set_cookies / save_profile / list_profiles]
        N9[detect_captcha / inject_captcha_token]
    end

Three ways to act on a UI element, tradeoff your call:

Deterministic, fast.

b.click("button.login")
b.fill("input[name='email']", "alice@example.com")
b.text("h1")

For sites where selectors are unstable. Pair with screenshot_annotated() for an LLM-friendly numbered overlay.

items = b.clickables()                # list with idx, x, y, text, ...
img   = b.screenshot_annotated()      # PNG with yellow numbered boxes
# ... your LLM looks at img + items, picks idx 5 ...
target = items[5]
b.click_at(target["x"], target["y"])

Anything Playwright can't express cleanly:

titles = b.evaluate("""
    () => [...document.querySelectorAll(".titleline > a")]
              .map(a => a.innerText).slice(0, 5)
""")

Stealth profile

sb.browser.start() defaults to stealth=True. That turns on:

Layer Patch
Launch flags --disable-blink-features=AutomationControlled, drop --enable-automation
Backend Patchright (drop-in replacement for playwright that fixes Runtime.enable + console.debug leaks at the chromium binary level)
navigator.webdriver scrubbed from prototype + own (so 'webdriver' in navigator is false)
window.chrome populated
navigator.plugins five-entry PluginArray (real plugins, real prototype)
navigator.languages ['en-US', 'en']
navigator.permissions.query consistent with native Chrome
navigator.hardwareConcurrency 8
navigator.deviceMemory 8
navigator.maxTouchPoints 0
Notification.permission 'default' (not 'denied')
window.outerHeight/Width offset from inner
navigator.userAgentData Chrome 120 brands + Linux platform
Canvas sub-pixel noise on getImageData / toDataURL
AudioContext float buffer noise on getChannelData
WebRTC host-candidate IPs stripped (no real-IP leak)
WebGL vendor / renderer reported as Intel Inc. / Iris OpenGL Engine
User-Agent Linux Chrome 120 (no HeadlessChrome string)
Locale en-US
Timezone Europe/Ljubljana

To get a vanilla automation profile (debugging, captcha tests):

b.start(stealth=False)

Captcha solving

Two paths. Use whichever is cheaper:

For reCAPTCHA v2 on sites that don't pre-block your IP. Cost: $0, accuracy ~70-80 %, ~5-10 s.

if b.detect_captcha():
    out = sb.captcha.solve_recaptcha_audio(retries=3)
    # → {"verified": True, "attempts": 1, "text": "..."}

Works for v2, v3, hCaptcha, Turnstile. Cost: $0.001-0.003 per solve, ~30-60 s.

info = b.detect_captcha()
if info:
    b.solve_captcha_on_page(api_key="<2captcha-key>")

Profile persistence

Cookies + localStorage + sessionStorage survive across sandbox lifetimes when you save them as a named profile:

# First sandbox: log in once.
with Sandbox.create(template="browser-use") as sb:
    sb.browser.start()
    sb.browser.navigate("https://app.example.com/login")
    sb.browser.fill("#email", "alice@example.com")
    sb.browser.fill("#password", "...")
    sb.browser.click("button[type=submit]")
    sb.browser.save_profile("alice-app")

# Later sandbox: restore — already logged in.
with Sandbox.create(template="browser-use") as sb:
    sb.browser.start(profile="alice-app")
    sb.browser.navigate("https://app.example.com/dashboard")
    print(sb.browser.text("h1"))   # whatever's gated behind login

Profiles live in /var/firebox-profiles/<name>.json inside the template's rootfs. They don't leak across templates — alice-app saved from a browser-use template only loads when you start from that same template.

Live preview via VNC

The browser-use template ships with Xvfb + x11vnc + fluxbox. Useful when an LLM-driven agent's flow is failing and you want to watch.

# In the sandbox:
sb.run("firebox-display firebox")        # password=firebox

# On the host, DNAT a public port to the sandbox's :5900, then:
open vnc://:firebox@your-host:33000

The run_agent_vnc.py example wires this up end-to-end.

What it can't do

  • GPU rendering — Chromium runs on CPU. Bake an --enable-gpu flag and live with software fallback if you must.
  • Real Chrome TLS — Chromium's TLS hello differs from Chrome's. Cloudflare Bot Manager / Akamai do fingerprint at the TLS layer. Workaround: use sb.http (curl_cffi with real Chrome 120 JA3) for raw HTTP calls behind the same cookies.
  • Mobile emulation — viewport-based only; no touch event fidelity.
  • Pause / resume of the whole VM mid-test — no Firecracker snapshot integration yet.