Browser¶

sb.browser is a Playwright-driven Chromium running inside the sandbox. It's stealthed by default (passes bot.sannysoft.com fingerprint checks), threads its calls through a per-VM worker, and keeps state across calls — same tab, same cookies — until you close().

Drive it from outside¶

with Sandbox.create(template="browser-use") as sb:
    b = sb.browser
    b.start()                                  # launch Chromium
    b.navigate("https://example.com")
    print(b.text("h1"))                        # → "Example Domain"

The browser stays up across calls. There's no LLM inside the sandbox — the caller is the agent.

What's exposed¶

flowchart TB
    subgraph SDK["sb.browser API"]
        N1[start / close / state]
        N2[navigate / back / forward / reload]
        N3[click / click_at / fill / press / type / scroll]
        N4[wait_for]
        N5[text / text_all / attr / html]
        N6[screenshot / screenshot_annotated / clickables]
        N7[evaluate]
        N8[cookies / set_cookies / save_profile / list_profiles]
        N9[detect_captcha / inject_captcha_token]
    end

Three ways to act on a UI element, tradeoff your call:

CSS selectorVisual coordinatesJavaScript escape hatch

Deterministic, fast.

b.click("button.login")
b.fill("input[name='email']", "alice@example.com")
b.text("h1")

For sites where selectors are unstable. Pair with screenshot_annotated() for an LLM-friendly numbered overlay.

items = b.clickables()                # list with idx, x, y, text, ...
img   = b.screenshot_annotated()      # PNG with yellow numbered boxes
# ... your LLM looks at img + items, picks idx 5 ...
target = items[5]
b.click_at(target["x"], target["y"])

Anything Playwright can't express cleanly:

titles = b.evaluate("""
    () => [...document.querySelectorAll(".titleline > a")]
              .map(a => a.innerText).slice(0, 5)
""")

Stealth profile¶

sb.browser.start() defaults to stealth=True. That turns on:

Layer	Patch
Launch flags	`--disable-blink-features=AutomationControlled`, drop `--enable-automation`
Backend	Patchright (drop-in replacement for playwright that fixes Runtime.enable + console.debug leaks at the chromium binary level)
`navigator.webdriver`	scrubbed from prototype + own (so `'webdriver' in navigator` is `false`)
`window.chrome`	populated
`navigator.plugins`	five-entry `PluginArray` (real plugins, real prototype)
`navigator.languages`	`['en-US', 'en']`
`navigator.permissions.query`	consistent with native Chrome
`navigator.hardwareConcurrency`	8
`navigator.deviceMemory`	8
`navigator.maxTouchPoints`	0
`Notification.permission`	`'default'` (not `'denied'`)
`window.outerHeight/Width`	offset from inner
`navigator.userAgentData`	Chrome 120 brands + Linux platform
Canvas	sub-pixel noise on `getImageData` / `toDataURL`
AudioContext	float buffer noise on `getChannelData`
WebRTC	host-candidate IPs stripped (no real-IP leak)
WebGL	vendor / renderer reported as Intel Inc. / Iris OpenGL Engine
User-Agent	Linux Chrome 120 (no `HeadlessChrome` string)
Locale	`en-US`
Timezone	`Europe/Ljubljana`

To get a vanilla automation profile (debugging, captcha tests):

b.start(stealth=False)

Captcha solving¶

Two paths. Use whichever is cheaper:

Free — local Whisper (audio mode)Paid — 2captcha / anti-captcha

For reCAPTCHA v2 on sites that don't pre-block your IP. Cost: $0, accuracy ~70-80 %, ~5-10 s.

if b.detect_captcha():
    out = sb.captcha.solve_recaptcha_audio(retries=3)
    # → {"verified": True, "attempts": 1, "text": "..."}

Works for v2, v3, hCaptcha, Turnstile. Cost: $0.001-0.003 per solve, ~30-60 s.

info = b.detect_captcha()
if info:
    b.solve_captcha_on_page(api_key="<2captcha-key>")

Profile persistence¶

Cookies + localStorage + sessionStorage survive across sandbox lifetimes when you save them as a named profile:

# First sandbox: log in once.
with Sandbox.create(template="browser-use") as sb:
    sb.browser.start()
    sb.browser.navigate("https://app.example.com/login")
    sb.browser.fill("#email", "alice@example.com")
    sb.browser.fill("#password", "...")
    sb.browser.click("button[type=submit]")
    sb.browser.save_profile("alice-app")

# Later sandbox: restore — already logged in.
with Sandbox.create(template="browser-use") as sb:
    sb.browser.start(profile="alice-app")
    sb.browser.navigate("https://app.example.com/dashboard")
    print(sb.browser.text("h1"))   # whatever's gated behind login

Profiles live in /var/firebox-profiles/<name>.json inside the template's rootfs. They don't leak across templates — alice-app saved from a browser-use template only loads when you start from that same template.

Live preview via VNC¶

The browser-use template ships with Xvfb + x11vnc + fluxbox. Useful when an LLM-driven agent's flow is failing and you want to watch.

# In the sandbox:
sb.run("firebox-display firebox")        # password=firebox

# On the host, DNAT a public port to the sandbox's :5900, then:
open vnc://:firebox@your-host:33000

The run_agent_vnc.py example wires this up end-to-end.

What it can't do¶

GPU rendering — Chromium runs on CPU. Bake an --enable-gpu flag and live with software fallback if you must.
Real Chrome TLS — Chromium's TLS hello differs from Chrome's. Cloudflare Bot Manager / Akamai do fingerprint at the TLS layer. Workaround: use sb.http (curl_cffi with real Chrome 120 JA3) for raw HTTP calls behind the same cookies.
Mobile emulation — viewport-based only; no touch event fidelity.
Pause / resume of the whole VM mid-test — no Firecracker snapshot integration yet.