Agent recipes¶
Patterns for plugging firebox into a real agent loop. Pick one, copy, modify.
The shape is always the same:
flowchart LR
A[Your agent / LLM] -- decides actions --> T{firebox tools}
T -- sandbox_open --> S[microVM]
T -- browser_* --> S
T -- run --> S
T -- search --> SX[SearxNG]
S -- result / page text / screenshot --> A
SX -- result list --> A
A is your agent — Claude, GPT, your own loop. The tool layer is
firebox: SDK from Python, or MCP for any MCP-aware host. Sandboxes
provide isolation; firebox provides the API.
1. Web research¶
"Summarise the top three Hacker News stories and give me the authors of the linked articles."
The agent decides each step (navigate → read → click → read again →
synthesize). Firebox provides search + browser_*.
from firebox.sandbox import Sandbox
def research(question: str, llm) -> str:
with Sandbox.create(template="browser-use", ttl_seconds=600) as sb:
sb.browser.start()
# 1. broad search
results = sb.search.web(question, language="en")[:5]
# 2. open each, pull headline + lede
digests = []
for r in results:
sb.browser.navigate(r.url, timeout=15.0)
text = sb.browser.text("article, main, body")[:1500]
digests.append({"url": r.url, "snippet": text})
# 3. ask the LLM to synthesize
return llm.summarize(question, digests)
The model needs no glue code; it sees search, browser_* as
native tools and picks them in order.
User prompt → Claude:
Use firebox tools to find the top 3 Hacker News stories, open each one, and summarize what they're about.
Claude calls (typically): sandbox_open(template="browser-use")
→ browser_start → browser_navigate("https://news.ycombinator.com")
→ browser_text_all(".titleline > a") → for each top story
browser_click_at(...) → browser_text → sandbox_close.
No code on your side at all.
2. Code interpreter¶
"Run this Python code; if it errors, show me; if it produces a plot, return the image."
Pattern: write user code into the sandbox, run it, fetch any side-effect files (plots, csvs).
from firebox.sandbox import Sandbox
def execute(code: str) -> dict:
with Sandbox.create(template="base", ttl_seconds=120) as sb:
sb.files.write("/work/main.py", code)
result = sb.run("cd /work && python3 main.py", timeout=60)
out = {
"stdout": result.stdout,
"stderr": result.stderr,
"exit_code": result.exit_code,
}
# If the script produced a plot, return it
try:
out["plot_png"] = sb.files.read("/work/plot.png") # bytes
except RuntimeError:
pass
return out
Stream output while it runs (useful for long jobs):
for chunk in sb.stream("cd /work && python3 train.py"):
if chunk.stream == "stdout":
forward_to_user(chunk.data) # progress bars work
elif chunk.stream == "final":
print(f"exit {chunk.exit_code} in {chunk.duration:.2f}s")
3. Browser scraping (no LLM in the loop)¶
For deterministic scrapes — the agent that drives this is just your Python script.
from firebox.sandbox import Sandbox
with Sandbox.create(template="browser-use", ttl_seconds=120) as sb:
sb.browser.start()
sb.browser.navigate("https://old.reddit.com/r/programming/")
# JS escape hatch when CSS selectors get messy
stories = sb.browser.evaluate("""
() => [...document.querySelectorAll("div.thing.link:not(.promoted)")]
.filter(el => !el.classList.contains("stickied"))
.slice(0, 5)
.map(el => ({
title: el.querySelector("a.title")?.innerText,
score: el.querySelector(".score.unvoted")?.innerText,
url: el.querySelector("a.title")?.href,
}))
""")
for s in stories:
print(f" ({s['score']}) {s['title']} — {s['url']}")
Real, working version: examples/browser-use/reddit_live.py.
4. Lead generation (search + parallel scrape)¶
Search engines for contact pages, fetch each via real-Chrome TLS in parallel, filter for plausible business emails.
import re
from firebox.sandbox import Sandbox
EMAIL_RE = re.compile(r"[\w.+-]+@[\w.-]+\.[A-Za-z]{2,}")
def find_leads(queries: list[str], n: int = 10) -> list[dict]:
with Sandbox.create(template="browser-use", ttl_seconds=600) as sb:
# 1. fan-out search across N queries
urls = []
for q in queries:
for r in sb.search.web(q):
urls.append(r.url)
# 2. parallel HTTP fetch with real Chrome TLS (curl_cffi)
leads, seen = [], set()
for url in urls:
try:
html = sb.http.get(url, timeout=8).text
except Exception:
continue
for email in set(EMAIL_RE.findall(html)):
if email.lower() in seen: continue
seen.add(email.lower())
leads.append({"email": email, "source": url})
if len(leads) >= n:
return leads
return leads
Real version with proper plausibility filtering and contact-page
sub-paths: examples/browser-use/lead_finder.py.
5. Multi-agent fleet¶
Spawn N sandboxes and run different tasks concurrently. Each gets its own IP, its own browser, its own quota slot.
from concurrent.futures import ThreadPoolExecutor
from firebox.sandbox import Sandbox
def worker(task: str) -> dict:
with Sandbox.create(template="browser-use", ttl_seconds=300) as sb:
sb.browser.start()
sb.browser.navigate(task["url"])
return {
"task": task["id"],
"title": sb.browser.text("h1"),
"screenshot": sb.browser.screenshot(), # bytes
}
tasks = [
{"id": 1, "url": "https://example.com"},
{"id": 2, "url": "https://hbs.si"},
{"id": 3, "url": "https://news.ycombinator.com"},
]
with ThreadPoolExecutor(max_workers=5) as pool:
results = list(pool.map(worker, tasks))
Or via the CLI:
firebox run-many \
"echo agent-1 from \$(hostname)" \
"echo agent-2 from \$(hostname)" \
"echo agent-3 from \$(hostname)" \
--concurrency 3
run-many ships in the SDK as firebox.parallel.run_many —
useful when you want a clean batch fan-out from inside a larger
agent loop.
6. Long-lived agent session (LLM in the loop)¶
The Manus / browser-use shape: view() perceives, click_idx /
input_idx / select_option_idx act, diff-since-last-view keeps
session tokens bounded, network capture skips DOM scraping for API
data, the live noVNC stream URL lets the user watch.
from firebox.sandbox import Sandbox
class WebAgent:
def __init__(self, llm, watch: bool = False):
self.llm = llm
self.sb = Sandbox.create(template="browser-use", ttl_seconds=900)
# Restore saved login if available; else fresh stealth profile.
try:
self.sb.browser.start(profile="my-app-login")
except Exception:
self.sb.browser.start()
self.last_token: str | None = None
if watch:
stream = self.sb.stream.start()
print(f"watch live: {stream['url']}")
def step(self, user_msg: str) -> str:
# 1. Perceive — markdown + indexed elements + annotated screenshot.
# delta_only=True after the first turn ⇒ tiny payloads.
obs = self.sb.browser.view(
since_view_token=self.last_token,
delta_only=bool(self.last_token),
)
self.last_token = obs["view_token"]
# 2. Optionally read XHR data the page received — agents pulling
# structured data from APIs win big here vs DOM scraping.
api = self.sb.browser.network_log(url_contains="/api/", phase="response")
# 3. Reason — your LLM picks an action.
action = self.llm.decide(
user_msg,
observation=obs,
api_responses=api["entries"][-5:],
)
# 4. Act — index-based, no CSS selectors needed.
if action.kind == "click":
self.sb.browser.click_idx(action.idx)
elif action.kind == "type":
self.sb.browser.input_idx(action.idx, action.text)
elif action.kind == "select":
self.sb.browser.select_option_idx(action.idx, action.value)
elif action.kind == "scroll":
self.sb.browser.scroll_by(action.direction)
elif action.kind == "navigate":
self.sb.browser.navigate(action.url, wait_for_load="networkidle")
return obs.get("title") or "ok"
def close(self):
self.sb.browser.save_profile("my-app-login")
self.sb.close()
The agent's memory is the page; the LLM doesn't need a database.
TTL keeps the VM alive between turns; activity resets the clock.
Profile save/load preserves login state across sessions. With
delta_only=True, a 50-turn task fits in a fraction of the tokens
that the same flow with full view() payloads would burn.
7. Agent that uses firebox via MCP¶
The cleanest path when you don't own the agent loop. Anything MCP-aware (Claude Desktop, Code, Cursor, ChatGPT Agent, custom clients) can mount firebox tools and call them directly.
sequenceDiagram
participant U as You
participant Cl as Claude Desktop
participant M as firebox MCP server
participant D as firebox-daemon
participant S as microVM
U->>Cl: "Find me 10 fitness blog email leads."
Cl->>M: tools/call sandbox_open
M->>D: POST /sandboxes
D-->>M: { id }
M-->>Cl: { sandbox_id }
loop for each query
Cl->>M: tools/call search { q, categories: "general" }
M->>D: GET /search → SearxNG
M-->>Cl: { results: [...] }
Cl->>M: tools/call browser_navigate, browser_text, ...
end
Cl->>M: tools/call sandbox_close
M->>D: POST /sandboxes/X/close
Cl-->>U: "Here are 10 leads: ..."
Setup: see MCP server. After that, your agent gets 22 firebox tools and you write zero glue code.
8. Live progress to a UI / chat¶
Stream everything the sandbox is doing — search hits, file edits, shell output, browser activity — into your front-end as it happens.
from firebox.sandbox import Sandbox
def emit(channel, payload):
"""Replace with: SSE write, websocket send, Slack post, agent log..."""
print(f"[{channel}] {payload}")
with Sandbox.create(template="browser-use",
workspace="./project",
workspace_exclude=[".git/*", "__pycache__/*"]) as sb:
# 1. Search — stream results engine-by-engine
for r in sb.search.stream("user query", engines=["google","duckduckgo","brave"]):
emit("search", {"engine": r.engine, "title": r.title, "url": r.url})
# 2. Browser session — VNC live-stream (open in user's browser)
sb.browser.start(stealth=True) # visible (Xvfb) — default
sb.process.start("websockify 6080 localhost:5900", env={"DISPLAY": ":99"})
emit("browser", {"vnc_url": f"http://{sb.ip}:6080/vnc.html"})
sb.browser.navigate("https://example.com") # user sees it live
# 3. File edits — push every change as the agent works
import threading
def watcher():
for evt in sb.files.watch("/work", timeout=600):
emit("fs", evt) # MODIFY a.py, CREATE README.md, ...
threading.Thread(target=watcher, daemon=True).start()
# 4. Shell output — stdout/stderr line-by-line
for chunk in sb.stream("python3 /work/build.py"):
emit("shell", {"stream": chunk.stream, "data": chunk.data})
Four primitives, one unified live feed. The frontend can render each
channel differently — search as a list, browser as an embedded
noVNC iframe, fs as a file tree with highlighted edits, shell as
a terminal pane.
9. Drop into Anthropic Computer Use¶
Your agent code that already uses Claude's computer + bash tools
runs against firebox unchanged. The adapter does the schema mapping.
import anthropic
from firebox.sandbox import Sandbox
from firebox.adapters.anthropic import ComputerUseAdapter
sb = Sandbox.create(template="browser-use", ttl_seconds=900)
adapter = ComputerUseAdapter(sb, display_width=1280, display_height=800)
client = anthropic.Anthropic()
messages = [{"role": "user",
"content": "Open google.com and search for 'firebox sandbox'."}]
while True:
resp = client.beta.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
tools=adapter.tools, # tool defs
messages=messages,
betas=["computer-use-2025-01-24"],
)
messages.append({"role": "assistant", "content": resp.content})
if resp.stop_reason == "end_turn":
break
tool_results = [adapter.handle(b) for b in resp.content
if b.type == "tool_use"]
if not tool_results:
break
messages.append({"role": "user", "content": tool_results})
Per-action mapping:
| Anthropic action | Firebox primitive |
|---|---|
screenshot |
sb.os.screenshot() |
left_click / right_click / middle_click / double_click / triple_click |
sb.os.click(x, y, button=N) |
mouse_move |
sb.os.move_mouse(x, y) |
left_click_drag |
sb.os.drag(cur_x, cur_y, x, y) |
key / hold_key |
sb.os.key(text) (xdotool keysyms — ctrl+c, Return, alt+F4) |
type |
sb.os.type(text) |
scroll |
sb.os.move_mouse + sb.os.scroll(direction, amount) |
cursor_position |
sb.os.state() |
wait |
time.sleep |
bash |
sb.run(cmd) (stateless — for stateful, use sb.shells directly) |
Pair with sb.stream.start() and hand the user a URL to watch the
agent live:
11. Read XHR data instead of scraping the DOM¶
Modern SPAs render pages by calling JSON APIs. Instead of scraping the rendered HTML, snoop the responses the page already received.
from firebox.sandbox import Sandbox
with Sandbox.create(template="browser-use") as sb:
sb.browser.start()
# Navigate; networkidle wait ensures the SPA's XHRs settle.
sb.browser.navigate("https://app.example.com/dashboard",
wait_for_load="networkidle")
# Read the JSON the page received as it loaded.
log = sb.browser.network_log(
url_contains="/api/v1/items",
phase="response", min_status=200, limit=20,
)
for entry in log["entries"]:
items = json.loads(entry["body_preview"])
# ... fully-structured data, no fragile CSS selectors ...
Bodies are auto-captured for application/json / text/* /
javascript content types up to 10 KB each. For larger responses,
re-do the request via sb.http (real-Chrome TLS) carrying the
browser's cookies.
12. Watch your agent live¶
Hand the user a URL they open in any browser tab to see the agent work.
with Sandbox.create(template="browser-use", ttl_seconds=900) as sb:
sb.browser.start() # visible to Xvfb (default)
stream = sb.stream.start()
send_to_user(f"Watch the agent: {stream['url']}")
sb.browser.navigate("https://news.ycombinator.com")
# ... agent loop here; every action is visible in the user's tab ...
sb.stream.stop()
Powered by Xvfb + x11vnc + websockify + noVNC inside the sandbox,
DNAT'd via sb.ports.expose. The same sb.ports API works for any
service the agent stands up:
sb.run("cd /work && python3 -m http.server 3000 --bind 0.0.0.0 &")
ep = sb.ports.expose(3000)
send_to_user(f"Site is up: {ep['url']}")
Choosing a pattern¶
| Goal | Pattern | LLM in loop? |
|---|---|---|
| Deterministic scrape, known DOM | #3 | No |
| Scrape across many sites | #4 | No (LLM optional for synthesis) |
| LLM decides actions step-by-step | #1 or #7 | Yes |
| User pastes code, you run it | #2 | No (LLM wrote the code) |
| N tasks in parallel | #5 | No |
| Multi-turn, stateful (Manus-shape) | #6 | Yes |
| Plug into Claude / Cursor | #7 | Yes |
| Live progress to user UI | #8 | Optional |
| Anthropic Computer Use plug-in | #9 | Yes |
| API-driven SPA (read XHR not DOM) | #11 | Optional |
| User watches agent live | #12 | Yes |
Mix freely. A typical real agent does multi-turn (#6) but occasionally fans out (#5) for parallel verification, and uses the search helper inside any of these.