Skip to main content

Overview

Browser tools give the agent a full Chromium browser it can control programmatically. This is useful for pages that require JavaScript to render, login flows, form submission, and visual verification via screenshots. Browser tools are in the Full tier - they are only sent to frontier models (Claude, GPT-4o, Gemini) that can reliably reason about page state and UI interactions.
Browser tools require Puppeteer to be installed. Run pnpm install to install all optional dependencies, including Puppeteer.

Available Tools

ToolDescriptionSecurity
browser_navigateNavigate to a URLmoderate
browser_snapshotGet the current page accessibility treesafe
browser_clickClick an element by CSS selectormoderate
browser_typeType text into an input fieldmoderate
browser_searchSearch the web using browser navigationmoderate
browser_screenshotTake a screenshot of the current pagesafe
browser_pagesList all open browser tabs/pagessafe
browser_closeClose a page or the entire browsermoderate

Tool Details

browser_navigate

Navigate to a URL. Opens a new browser page if none is active.
url
string
required
Full URL to navigate to. Must be http:// or https://.
waitUntil
string
default:"networkidle2"
Wait condition: load, domcontentloaded, networkidle0, networkidle2.
timeout
number
default:"30000"
Navigation timeout in milliseconds.

browser_snapshot

Get the current page’s accessibility tree as structured text. More token-efficient than a screenshot for text-heavy pages.
includeHidden
boolean
default:"false"
Include hidden elements in the snapshot.

browser_click

Click an element on the page.
selector
string
required
CSS selector for the element to click. Use aria/Button Name for accessible selectors.
button
string
default:"left"
Mouse button: left, right, middle.
clickCount
number
default:"1"
Number of clicks (use 2 for double-click).

browser_type

Type text into an input or textarea.
selector
string
required
CSS selector for the input element.
text
string
required
Text to type.
clear
boolean
default:"false"
Clear existing content before typing.
delay
number
default:"0"
Delay between keystrokes in milliseconds (simulates human typing).

browser_screenshot

Capture the current page as a PNG image.
fullPage
boolean
default:"false"
Capture the full scrollable page, not just the viewport.
selector
string
Capture only a specific element instead of the whole page.
quality
number
default:"80"
JPEG quality (1-100). Only applies when format is jpeg.

browser_pages

List all currently open browser pages/tabs. Returns page IDs, URLs, and titles. Use page IDs to target operations at specific tabs.

browser_close

Close a specific page or all browser pages.
pageId
string
ID of the specific page to close. Omit to close all pages.

Typical Workflow

1

Navigate to the page

{ "url": "https://app.example.com/login" }
2

Take a snapshot to see the page structure

{ "includeHidden": false }
3

Fill in the login form

{ "selector": "#email", "text": "user@example.com", "clear": true }
{ "selector": "#password", "text": "mypassword", "clear": true }
4

Click the submit button

{ "selector": "button[type='submit']" }
5

Screenshot to verify the result

{ "fullPage": false }

Safe vs Full Browser Tools

By default, only “safe” browser tools (browser_snapshot, browser_screenshot, browser_pages) are included for non-frontier models. The full automation set requires a frontier model to reason about page state accurately.
# settings.yml - override for specific presets
presets:
  browser-agent:
    tools:
      promote:
        - browser_navigate
        - browser_click
        - browser_type