Agent recipes¶

Patterns for plugging firebox into a real agent loop. Pick one, copy, modify.

The shape is always the same:

flowchart LR
    A[Your agent / LLM] -- decides actions --> T{firebox tools}
    T -- sandbox_open --> S[microVM]
    T -- browser_* --> S
    T -- run --> S
    T -- search --> SX[SearxNG]
    S -- result / page text / screenshot --> A
    SX -- result list --> A

A is your agent — Claude, GPT, your own loop. The tool layer is firebox: SDK from Python, or MCP for any MCP-aware host. Sandboxes provide isolation; firebox provides the API.

1. Web research¶

"Summarise the top three Hacker News stories and give me the authors of the linked articles."

The agent decides each step (navigate → read → click → read again → synthesize). Firebox provides search + browser_*.

Python (your agent imports the SDK)MCP (Claude Desktop drives it)

from firebox.sandbox import Sandbox

def research(question: str, llm) -> str:
    with Sandbox.create(template="browser-use", ttl_seconds=600) as sb:
        sb.browser.start()
        # 1. broad search
        results = sb.search.web(question, language="en")[:5]
        # 2. open each, pull headline + lede
        digests = []
        for r in results:
            sb.browser.navigate(r.url, timeout=15.0)
            text = sb.browser.text("article, main, body")[:1500]
            digests.append({"url": r.url, "snippet": text})
        # 3. ask the LLM to synthesize
        return llm.summarize(question, digests)

The model needs no glue code; it sees search, browser_* as native tools and picks them in order.

User prompt → Claude:

Use firebox tools to find the top 3 Hacker News stories, open each one, and summarize what they're about.

Claude calls (typically): sandbox_open(template="browser-use") → browser_start → browser_navigate("https://news.ycombinator.com") → browser_text_all(".titleline > a") → for each top story browser_click_at(...) → browser_text → sandbox_close.

No code on your side at all.

2. Code interpreter¶

"Run this Python code; if it errors, show me; if it produces a plot, return the image."

Pattern: write user code into the sandbox, run it, fetch any side-effect files (plots, csvs).

from firebox.sandbox import Sandbox

def execute(code: str) -> dict:
    with Sandbox.create(template="base", ttl_seconds=120) as sb:
        sb.files.write("/work/main.py", code)
        result = sb.run("cd /work && python3 main.py", timeout=60)
        out = {
            "stdout": result.stdout,
            "stderr": result.stderr,
            "exit_code": result.exit_code,
        }
        # If the script produced a plot, return it
        try:
            out["plot_png"] = sb.files.read("/work/plot.png")  # bytes
        except RuntimeError:
            pass
        return out

Stream output while it runs (useful for long jobs):

for chunk in sb.stream("cd /work && python3 train.py"):
    if chunk.stream == "stdout":
        forward_to_user(chunk.data)         # progress bars work
    elif chunk.stream == "final":
        print(f"exit {chunk.exit_code} in {chunk.duration:.2f}s")

3. Browser scraping (no LLM in the loop)¶

For deterministic scrapes — the agent that drives this is just your Python script.

from firebox.sandbox import Sandbox

with Sandbox.create(template="browser-use", ttl_seconds=120) as sb:
    sb.browser.start()
    sb.browser.navigate("https://old.reddit.com/r/programming/")
    # JS escape hatch when CSS selectors get messy
    stories = sb.browser.evaluate("""
        () => [...document.querySelectorAll("div.thing.link:not(.promoted)")]
                  .filter(el => !el.classList.contains("stickied"))
                  .slice(0, 5)
                  .map(el => ({
                      title: el.querySelector("a.title")?.innerText,
                      score: el.querySelector(".score.unvoted")?.innerText,
                      url:   el.querySelector("a.title")?.href,
                  }))
    """)
    for s in stories:
        print(f"  ({s['score']}) {s['title']} — {s['url']}")

Real, working version: examples/browser-use/reddit_live.py.

4. Lead generation (search + parallel scrape)¶

Search engines for contact pages, fetch each via real-Chrome TLS in parallel, filter for plausible business emails.

import re
from firebox.sandbox import Sandbox

EMAIL_RE = re.compile(r"[\w.+-]+@[\w.-]+\.[A-Za-z]{2,}")

def find_leads(queries: list[str], n: int = 10) -> list[dict]:
    with Sandbox.create(template="browser-use", ttl_seconds=600) as sb:
        # 1. fan-out search across N queries
        urls = []
        for q in queries:
            for r in sb.search.web(q):
                urls.append(r.url)
        # 2. parallel HTTP fetch with real Chrome TLS (curl_cffi)
        leads, seen = [], set()
        for url in urls:
            try:
                html = sb.http.get(url, timeout=8).text
            except Exception:
                continue
            for email in set(EMAIL_RE.findall(html)):
                if email.lower() in seen: continue
                seen.add(email.lower())
                leads.append({"email": email, "source": url})
                if len(leads) >= n:
                    return leads
        return leads

Real version with proper plausibility filtering and contact-page sub-paths: examples/browser-use/lead_finder.py.

5. Multi-agent fleet¶

Spawn N sandboxes and run different tasks concurrently. Each gets its own IP, its own browser, its own quota slot.

from concurrent.futures import ThreadPoolExecutor
from firebox.sandbox import Sandbox

def worker(task: str) -> dict:
    with Sandbox.create(template="browser-use", ttl_seconds=300) as sb:
        sb.browser.start()
        sb.browser.navigate(task["url"])
        return {
            "task": task["id"],
            "title": sb.browser.text("h1"),
            "screenshot": sb.browser.screenshot(),    # bytes
        }

tasks = [
    {"id": 1, "url": "https://example.com"},
    {"id": 2, "url": "https://hbs.si"},
    {"id": 3, "url": "https://news.ycombinator.com"},
]
with ThreadPoolExecutor(max_workers=5) as pool:
    results = list(pool.map(worker, tasks))

Or via the CLI:

firebox run-many \
    "echo agent-1 from \$(hostname)" \
    "echo agent-2 from \$(hostname)" \
    "echo agent-3 from \$(hostname)" \
    --concurrency 3

run-many ships in the SDK as firebox.parallel.run_many — useful when you want a clean batch fan-out from inside a larger agent loop.

6. Long-lived agent session (LLM in the loop)¶

The Manus / browser-use shape: view() perceives, click_idx / input_idx / select_option_idx act, diff-since-last-view keeps session tokens bounded, network capture skips DOM scraping for API data, the live noVNC stream URL lets the user watch.

from firebox.sandbox import Sandbox

class WebAgent:
    def __init__(self, llm, watch: bool = False):
        self.llm = llm
        self.sb = Sandbox.create(template="browser-use", ttl_seconds=900)
        # Restore saved login if available; else fresh stealth profile.
        try:
            self.sb.browser.start(profile="my-app-login")
        except Exception:
            self.sb.browser.start()
        self.last_token: str | None = None
        if watch:
            stream = self.sb.stream.start()
            print(f"watch live: {stream['url']}")

    def step(self, user_msg: str) -> str:
        # 1. Perceive — markdown + indexed elements + annotated screenshot.
        #    delta_only=True after the first turn ⇒ tiny payloads.
        obs = self.sb.browser.view(
            since_view_token=self.last_token,
            delta_only=bool(self.last_token),
        )
        self.last_token = obs["view_token"]

        # 2. Optionally read XHR data the page received — agents pulling
        #    structured data from APIs win big here vs DOM scraping.
        api = self.sb.browser.network_log(url_contains="/api/", phase="response")

        # 3. Reason — your LLM picks an action.
        action = self.llm.decide(
            user_msg,
            observation=obs,
            api_responses=api["entries"][-5:],
        )

        # 4. Act — index-based, no CSS selectors needed.
        if action.kind == "click":
            self.sb.browser.click_idx(action.idx)
        elif action.kind == "type":
            self.sb.browser.input_idx(action.idx, action.text)
        elif action.kind == "select":
            self.sb.browser.select_option_idx(action.idx, action.value)
        elif action.kind == "scroll":
            self.sb.browser.scroll_by(action.direction)
        elif action.kind == "navigate":
            self.sb.browser.navigate(action.url, wait_for_load="networkidle")

        return obs.get("title") or "ok"

    def close(self):
        self.sb.browser.save_profile("my-app-login")
        self.sb.close()

The agent's memory is the page; the LLM doesn't need a database. TTL keeps the VM alive between turns; activity resets the clock. Profile save/load preserves login state across sessions. With delta_only=True, a 50-turn task fits in a fraction of the tokens that the same flow with full view() payloads would burn.

7. Agent that uses firebox via MCP¶

The cleanest path when you don't own the agent loop. Anything MCP-aware (Claude Desktop, Code, Cursor, ChatGPT Agent, custom clients) can mount firebox tools and call them directly.

sequenceDiagram
    participant U as You
    participant Cl as Claude Desktop
    participant M as firebox MCP server
    participant D as firebox-daemon
    participant S as microVM

    U->>Cl: "Find me 10 fitness blog email leads."
    Cl->>M: tools/call sandbox_open
    M->>D: POST /sandboxes
    D-->>M: { id }
    M-->>Cl: { sandbox_id }

    loop for each query
      Cl->>M: tools/call search { q, categories: "general" }
      M->>D: GET /search → SearxNG
      M-->>Cl: { results: [...] }
      Cl->>M: tools/call browser_navigate, browser_text, ...
    end

    Cl->>M: tools/call sandbox_close
    M->>D: POST /sandboxes/X/close
    Cl-->>U: "Here are 10 leads: ..."

Setup: see MCP server. After that, your agent gets 22 firebox tools and you write zero glue code.

8. Live progress to a UI / chat¶

Stream everything the sandbox is doing — search hits, file edits, shell output, browser activity — into your front-end as it happens.

from firebox.sandbox import Sandbox

def emit(channel, payload):
    """Replace with: SSE write, websocket send, Slack post, agent log..."""
    print(f"[{channel}] {payload}")

with Sandbox.create(template="browser-use",
                    workspace="./project",
                    workspace_exclude=[".git/*", "__pycache__/*"]) as sb:

    # 1. Search — stream results engine-by-engine
    for r in sb.search.stream("user query", engines=["google","duckduckgo","brave"]):
        emit("search", {"engine": r.engine, "title": r.title, "url": r.url})

    # 2. Browser session — VNC live-stream (open in user's browser)
    sb.browser.start(stealth=True)                  # visible (Xvfb) — default
    sb.process.start("websockify 6080 localhost:5900", env={"DISPLAY": ":99"})
    emit("browser", {"vnc_url": f"http://{sb.ip}:6080/vnc.html"})
    sb.browser.navigate("https://example.com")     # user sees it live

    # 3. File edits — push every change as the agent works
    import threading
    def watcher():
        for evt in sb.files.watch("/work", timeout=600):
            emit("fs", evt)            # MODIFY a.py, CREATE README.md, ...
    threading.Thread(target=watcher, daemon=True).start()

    # 4. Shell output — stdout/stderr line-by-line
    for chunk in sb.stream("python3 /work/build.py"):
        emit("shell", {"stream": chunk.stream, "data": chunk.data})

Four primitives, one unified live feed. The frontend can render each channel differently — search as a list, browser as an embedded noVNC iframe, fs as a file tree with highlighted edits, shell as a terminal pane.

9. Drop into Anthropic Computer Use¶

Your agent code that already uses Claude's computer + bash tools runs against firebox unchanged. The adapter does the schema mapping.

import anthropic
from firebox.sandbox import Sandbox
from firebox.adapters.anthropic import ComputerUseAdapter

sb = Sandbox.create(template="browser-use", ttl_seconds=900)
adapter = ComputerUseAdapter(sb, display_width=1280, display_height=800)

client = anthropic.Anthropic()
messages = [{"role": "user",
              "content": "Open google.com and search for 'firebox sandbox'."}]

while True:
    resp = client.beta.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        tools=adapter.tools,                          # tool defs
        messages=messages,
        betas=["computer-use-2025-01-24"],
    )
    messages.append({"role": "assistant", "content": resp.content})
    if resp.stop_reason == "end_turn":
        break
    tool_results = [adapter.handle(b) for b in resp.content
                    if b.type == "tool_use"]
    if not tool_results:
        break
    messages.append({"role": "user", "content": tool_results})

Per-action mapping:

Anthropic action	Firebox primitive
`screenshot`	`sb.os.screenshot()`
`left_click` / `right_click` / `middle_click` / `double_click` / `triple_click`	`sb.os.click(x, y, button=N)`
`mouse_move`	`sb.os.move_mouse(x, y)`
`left_click_drag`	`sb.os.drag(cur_x, cur_y, x, y)`
`key` / `hold_key`	`sb.os.key(text)` (xdotool keysyms — `ctrl+c`, `Return`, `alt+F4`)
`type`	`sb.os.type(text)`
`scroll`	`sb.os.move_mouse` + `sb.os.scroll(direction, amount)`
`cursor_position`	`sb.os.state()`
`wait`	`time.sleep`
`bash`	`sb.run(cmd)` (stateless — for stateful, use `sb.shells` directly)

Pair with sb.stream.start() and hand the user a URL to watch the agent live:

print(f"Watch: {sb.stream.start()['url']}")

11. Read XHR data instead of scraping the DOM¶

Modern SPAs render pages by calling JSON APIs. Instead of scraping the rendered HTML, snoop the responses the page already received.

from firebox.sandbox import Sandbox

with Sandbox.create(template="browser-use") as sb:
    sb.browser.start()
    # Navigate; networkidle wait ensures the SPA's XHRs settle.
    sb.browser.navigate("https://app.example.com/dashboard",
                         wait_for_load="networkidle")

    # Read the JSON the page received as it loaded.
    log = sb.browser.network_log(
        url_contains="/api/v1/items",
        phase="response", min_status=200, limit=20,
    )
    for entry in log["entries"]:
        items = json.loads(entry["body_preview"])
        # ... fully-structured data, no fragile CSS selectors ...

Bodies are auto-captured for application/json / text/* / javascript content types up to 10 KB each. For larger responses, re-do the request via sb.http (real-Chrome TLS) carrying the browser's cookies.

12. Watch your agent live¶

Hand the user a URL they open in any browser tab to see the agent work.

with Sandbox.create(template="browser-use", ttl_seconds=900) as sb:
    sb.browser.start()                       # visible to Xvfb (default)
    stream = sb.stream.start()
    send_to_user(f"Watch the agent: {stream['url']}")

    sb.browser.navigate("https://news.ycombinator.com")
    # ... agent loop here; every action is visible in the user's tab ...

    sb.stream.stop()

Powered by Xvfb + x11vnc + websockify + noVNC inside the sandbox, DNAT'd via sb.ports.expose. The same sb.ports API works for any service the agent stands up:

sb.run("cd /work && python3 -m http.server 3000 --bind 0.0.0.0 &")
ep = sb.ports.expose(3000)
send_to_user(f"Site is up: {ep['url']}")

Choosing a pattern¶

Goal	Pattern	LLM in loop?
Deterministic scrape, known DOM	#3	No
Scrape across many sites	#4	No (LLM optional for synthesis)
LLM decides actions step-by-step	#1 or #7	Yes
User pastes code, you run it	#2	No (LLM wrote the code)
N tasks in parallel	#5	No
Multi-turn, stateful (Manus-shape)	#6	Yes
Plug into Claude / Cursor	#7	Yes
Live progress to user UI	#8	Optional
Anthropic Computer Use plug-in	#9	Yes
API-driven SPA (read XHR not DOM)	#11	Optional
User watches agent live	#12	Yes

Mix freely. A typical real agent does multi-turn (#6) but occasionally fans out (#5) for parallel verification, and uses the search helper inside any of these.