Plain text and markdown snippets with raw URLs. Browse, search, and share with LLMs, tools, or anyone.
telegram-notification-setup
# Telegram Notifications — Integration Context A reference for sending notification messages to a Telegram chat via a bot. This document is the single source of truth: an agent should be able to wire up notifications using only the endpoint, parameters, formatting rules, and message recipes below. > **How to use this:** Read the two secrets (`TELEGRAM_BOT_TOKEN`, `TELEGRAM_ALLOWED_CHAT_ID`) from configuration — they are provided separately and must never be hardcoded or logged. Build every request from the documented parameter set and reuse the message recipes rather than inventing new shapes. --- ## 1. Foundations ### 1.1 What this does A Telegram bot can push messages into a chat (a user, group, or channel) over a plain HTTPS API. There is no SDK requirement — every action is an HTTPS request. For notifications, the only method needed is `sendMessage`. ### 1.2 Secrets (supplied separately) | Name | Meaning | Source | |---|---|---| | `TELEGRAM_BOT_TOKEN` | Bot auth token from BotFather. Forms part of the URL path. | Provided out-of-band | | `TELEGRAM_ALLOWED_CHAT_ID` | Destination chat ID the bot is allowed to message. | Provided out-of-band | - Treat both as secrets. Load from environment variables or a secrets manager. - Never commit them, never echo them in logs, never include them in error messages. - A user must have started a conversation with the bot (or added it to the group) at least once before the bot can message them. ### 1.3 Base endpoint ``` POST https://api.telegram.org/bot<TELEGRAM_BOT_TOKEN>/sendMessage Content-Type: application/json ``` The token is concatenated directly after `bot` in the path. The body is JSON. --- ## 2. Request contract ### 2.1 Core parameters (`sendMessage`) | Field | Type | Required | Use | |---|---|---|---| | `chat_id` | string/int | yes | Destination. Use `TELEGRAM_ALLOWED_CHAT_ID`. | | `text` | string | yes | Message body. Max **4096** characters. | | `parse_mode` | string | no | `HTML`, `MarkdownV2`, or `Markdown` (legacy). Omit for plain text. | | `disable_notification` | bool | no | `true` = deliver silently (no sound/vibration). | | `protect_content` | bool | no | `true` = recipient cannot forward or save. | | `link_preview_options` | object | no | `{ "is_disabled": true }` to suppress URL preview cards. | | `reply_markup` | object | no | Attach inline buttons (see 4.5). | | `message_thread_id` | int | no | Target a specific topic in a forum/group. | | `reply_parameters` | object | no | `{ "message_id": <id> }` to reply to a prior message. | ### 2.2 Response On success Telegram returns `{ "ok": true, "result": { ... } }` where `result` is the sent Message (contains `message_id`). On failure: `{ "ok": false, "error_code": <int>, "description": "<reason>" }`. ### 2.3 Verifying setup `GET https://api.telegram.org/bot<TELEGRAM_BOT_TOKEN>/getMe` returns the bot identity if the token is valid. Run this once at startup as a health check before sending notifications. --- ## 3. Formatting Pick **one** `parse_mode`. If you set one, unescaped reserved characters will cause a `400 Bad Request`. ### 3.1 HTML (recommended default) Simplest to generate safely. Supported tags: `<b>`, `<i>`, `<u>`, `<s>`, `<code>`, `<pre>`, `<a href="...">`, `<blockquote>`. - Escape these in any dynamic text: `&` → `&`, `<` → `<`, `>` → `>`. ### 3.2 MarkdownV2 Stricter. These characters must be backslash-escaped **everywhere** they appear literally: ``` _ * [ ] ( ) ~ ` > # + - = | { } . ! ``` Prefer HTML unless MarkdownV2 is specifically required — it is easy to break with un-escaped punctuation. ### 3.3 Plain text Omit `parse_mode` entirely. No escaping needed. Use this when the message contains arbitrary/user content you cannot guarantee is safe to format. --- ## 4. Message recipes Compose notifications from these patterns. All examples use `HTML` parse mode unless noted. **Status prefix convention** — lead messages with the matching indicator: 🟢 success · 🟡 alert · 🔴 error. ### 4.1 Plain notification ```json { "chat_id": "<TELEGRAM_ALLOWED_CHAT_ID>", "text": "Task completed successfully." } ``` ### 4.2 Success summary (structured) ```json { "chat_id": "<TELEGRAM_ALLOWED_CHAT_ID>", "parse_mode": "HTML", "text": "🟢 <b>Backup completed</b>\n\n<b>Host:</b> db-01\n<b>Size:</b> 4.2 GB\n<b>Duration:</b> 38s" } ``` ### 4.3 Alert / failure with detail ```json { "chat_id": "<TELEGRAM_ALLOWED_CHAT_ID>", "parse_mode": "HTML", "text": "🟡 <b>Alert</b>\n<i>Service:</i> api-gateway\n<i>Status:</i> <u>DOWN</u>" } ``` ### 4.4 Error with stack trace ```json { "chat_id": "<TELEGRAM_ALLOWED_CHAT_ID>", "parse_mode": "HTML", "text": "🔴 <b>Job failed</b>\n<pre>ValueError: invalid payload</pre>" } ``` ### 4.5 Notification with action button ```json { "chat_id": "<TELEGRAM_ALLOWED_CHAT_ID>", "text": "Deployment requires approval.", "reply_markup": { "inline_keyboard": [[ { "text": "View build", "url": "https://ci.example.com/build/1423" } ]] } } ``` ### 4.6 Silent / low-priority ```json { "chat_id": "<TELEGRAM_ALLOWED_CHAT_ID>", "text": "Nightly job finished.", "disable_notification": true } ``` ### 4.7 Sensitive (no forward/save) ```json { "chat_id": "<TELEGRAM_ALLOWED_CHAT_ID>", "text": "One-time code: 4827. Do not share.", "protect_content": true } ``` ### 4.8 Link without preview card ```json { "chat_id": "<TELEGRAM_ALLOWED_CHAT_ID>", "text": "New PR: https://github.com/org/repo/pull/123", "link_preview_options": { "is_disabled": true } } ``` --- ## 5. Operational rules ### 5.1 Limits | Constraint | Value | |---|---| | Message text length | 4096 characters (split longer messages) | | Per-chat send rate | ~1 message/second sustained | | Global send rate | ~30 messages/second across all chats | | Same group/channel | ~20 messages/minute | Exceeding limits returns `429 Too Many Requests` with a `parameters.retry_after` (seconds) value. ### 5.2 Error handling - Inspect `ok`. On `false`, read `error_code` + `description`. - On `429`, wait `parameters.retry_after` seconds, then retry once. - `400` usually means malformed `text`/formatting — most often an unescaped character under a `parse_mode`. Fall back to plain text if formatting fails. - `403` means the bot was blocked by the user or removed from the chat — do not retry. - Use a short timeout (e.g. 10s) and treat network failures as non-fatal for notifications. ### 5.3 Allowed-chat guard Send only to `TELEGRAM_ALLOWED_CHAT_ID`. If the integration receives or derives any other chat ID, reject it rather than messaging an unintended recipient. --- ## 6. Usage rules - Read `TELEGRAM_BOT_TOKEN` and `TELEGRAM_ALLOWED_CHAT_ID` from config; never hardcode or log them. - Use `getMe` as a startup health check before sending. - Default to `parse_mode: HTML`; escape `& < >` in all dynamic content; fall back to plain text on a formatting `400`. - Keep messages under 4096 chars; split or summarize longer payloads. - Respect rate limits; honor `retry_after` on `429`. - Use `disable_notification` for low-priority events and `protect_content` for sensitive ones. - Treat `403` as a permanent stop for that chat; never retry it.
spark-design-system
# Spark Dark — Design System A dark-theme design system for building UIs. Electric-violet accent, deep near-black surfaces, subtle grid texture, humanist sans for content and monospace for metadata. This document is the single source of truth: an agent should be able to build any interface using only the tokens, scales, and component patterns below. > **How to use this:** Reference tokens by their CSS variable name (e.g. `--spark-accent`), never raw hex, when generating components. Reuse the documented component patterns and layout rules rather than inventing new values. --- ## 1. Foundations ### 1.1 Color tokens Exposed as CSS custom properties on `:root`. **Surfaces** (darkest → lightest — this is the elevation ladder) | Token | Value | Use | |---|---|---| | `--spark-bg-root` | `#08090c` | Page background | | `--spark-bg-input` | `#0d0f16` | Input / textarea fields | | `--spark-bg-panel` | `#10121a` | Panels and shells | | `--spark-bg-card` | `#161926` | Cards, chips | | `--spark-bg-elevated` | `#1c2033` | Raised elements | | `--spark-bg-hover` | `#1f2340` | Hover state for raised elements | **Borders** | Token | Value | Use | |---|---|---| | `--spark-border` | `rgba(255,255,255,0.06)` | Default hairline border on containers | | `--spark-border-focus` | `rgba(120,100,255,0.50)` | Focus / hover border | **Accent — electric violet** | Token | Value | Use | |---|---|---| | `--spark-accent` | `#8b6eff` | Primary accent: buttons, active states, primary fills | | `--spark-accent-light` | `#b8a4ff` | Accent text, links, gradient endpoints | | `--spark-accent-dim` | `rgba(139,110,255,0.18)` | Tinted backgrounds, focus rings | | `--spark-accent-glow` | `rgba(139,110,255,0.40)` | Glow shadows | **Semantic** | Token | Value | Use | |---|---|---| | `--spark-success` | `#34d399` | Success / online | | `--spark-warning` | `#fbbf24` | Warning / in-progress | | `--spark-danger` | `#f87171` | Error / destructive | **Text** (highest → lowest emphasis) | Token | Value | Use | |---|---|---| | `--spark-text-1` | `#ecedf4` | Primary text, titles | | `--spark-text-2` | `#9698ad` | Secondary / body copy | | `--spark-text-3` | `#7e8099` | Muted: labels, captions, placeholders | ### 1.2 Typography | Token | Stack | |---|---| | `--font-sans` | `'Plus Jakarta Sans', -apple-system, BlinkMacSystemFont, sans-serif` | | `--font-mono` | `'JetBrains Mono', 'SF Mono', Consolas, monospace` | - **Import:** Plus Jakarta Sans (400,500,600,700,800) + JetBrains Mono (400,500,600). - **Base:** `16px` root, antialiased. - **Sans** = all human-facing content: body, titles, buttons. **Mono** = metadata: labels, badges, status, stats, captions, and any machine/code output. **Type scale** | Role | Size | Weight | Tracking | |---|---|---|---| | Page title | `clamp(1.5rem, 4vw, 2.25rem)` | 800 | `-0.035em` | | Section / card title | `1.15rem` | 700 | `-0.02em` | | Subtitle | `0.82rem` | 500 | — | | Body | `0.84–0.88rem` | 400 | line-height `1.55` | | Label (mono, uppercase) | `0.72rem` | 700 | `0.06em` | | Badge / chip (mono, uppercase) | `0.62–0.7rem` | 600 | `0.06–0.08em` | | Micro / stats (mono) | `0.6rem` | — | — | ### 1.3 Radii | Token | Value | Use | |---|---|---| | `--radius-sm` | `8px` | Inputs, buttons, small tints, banners | | `--radius-md` | `14px` | Message bubbles, mid containers | | `--radius-lg` | `20px` | Cards, panels, shells | | `--radius-xl` | `28px` | Largest containers | Pills / fully-rounded use `999px`. ### 1.4 Shadows | Token | Value | Use | |---|---|---| | `--shadow-card` | `0 2px 8px rgba(0,0,0,.35), 0 12px 40px rgba(0,0,0,.25)` | Default elevation | | `--shadow-glow` | `0 0 30px var(--spark-accent-glow)` | Accent glow | | `--shadow-input` | `inset 0 1px 3px rgba(0,0,0,.4)` | Recessed inputs | Hover elevation composes both: `var(--shadow-card), 0 0 26px var(--spark-accent-glow)`. ### 1.5 Background texture Optional faint violet grid on the page background: two crossed linear-gradients at `rgba(139,110,255,0.018)`, `background-size: 48px 48px`. Keep opacity near-invisible — it reads as texture, not pattern. --- ## 2. Layout ### 2.1 Container widths Fluid-capped with `min(viewport, max)`: | Context | Width | |---|---| | Standard single column | `min(92vw, 680px)` | | Wide content grid | `min(92vw, 960px)` | | Two-column workspace | `min(96vw, 1120px)` | | Reading / conversation column | `min(94vw, 760px)` | ### 2.2 Page shell Vertical flex column, centered, `padding: 24px 16px 40px`, `gap: 20px`, `min-height: 100svh`. ### 2.3 Grids - **Auto grid:** `repeat(auto-fit, minmax(260px, 1fr))`, `gap: 18px` — self-wrapping cards. - **Two-column workspace:** `minmax(0,1.15fr) minmax(0,1fr)`, `gap: 18px`, `align-items: start`. - **Input row:** `1fr auto` — field + adjacent button. ### 2.4 Breakpoints | Max-width | Change | |---|---| | `860px` | Two-column layouts collapse to one column | | `640px` | Tighter page padding/gap; input rows stack; buttons go full-width and taller (44px); status bars stack left-aligned; page title shrinks to `1.3rem` | --- ## 3. Motion | Keyframe | Behavior | Used by | |---|---|---| | `pulse-dot` | opacity `1→.5`, scale `1→.8` over `1.2–2s` | status dots, streaming/loading dots | | `spin` | continuous 360° | spinners | **Transition conventions:** transforms `.12–.15s`; color/border `.2s`; box-shadow `.25–.3s`. Hover lift is `translateY(-1px to -3px)`, reset to `0` on `:active`. --- ## 4. Components Recipes list defining properties. Compose every value from the tokens above. ### 4.1 Badge / pill - Pill (`999px`), `--spark-accent-dim` bg, `1px` accent border at `0.2` alpha, `--spark-accent-light` text. - Mono, uppercase, `0.62–0.7rem`, weight 600. - Status variant: prepend a 6px pulsing `--spark-accent` dot (glow + `pulse-dot`). ### 4.2 Gradient title - `linear-gradient(135deg, #ecedf4 30%, var(--spark-accent-light) 100%)` clipped to text. ### 4.3 Card - `--spark-bg-card`, `1px --spark-border`, `--radius-lg`, `--shadow-card`, `padding: 22px`, flex column `gap: 10px`. - **Hover:** lift `-3px`, border → `--spark-border-focus`, add accent glow. - Anatomy: optional badge → title → description (`flex: 1`) → call-to-action (accent-light, 600). ### 4.4 Panel - `--spark-bg-panel`, `1px --spark-border`, `--radius-lg`, `--shadow-card`, `padding: 16px 18px`, flex column `gap: 14px`. ### 4.5 Field + label - Field: flex column `gap: 5px`. - Label: mono uppercase, `0.72rem`, 700, tracking `0.06em`, `--spark-text-3`. ### 4.6 Input / textarea - `--spark-bg-input`, `1px --spark-border`, `--radius-sm`, `--shadow-input`, `padding: 10px 14px`, sans `0.85rem`, `resize: none`. - **Focus:** border → `--spark-border-focus` + `0 0 0 3px --spark-accent-dim` ring. - Read-only → `--spark-text-2`. Placeholder → `--spark-text-3`. - Output/scroll variant: mono, scrollable, thin custom scrollbar (5px track, `--spark-accent-dim` thumb). ### 4.7 Primary button - `--radius-sm`, height `40px` (44px on mobile), sans `0.82rem`/700, flex-centered with optional icon gap. - **Default:** `--spark-accent` bg, white text, shadow `0 4px 18px rgba(139,110,255,.35)` (deepens on hover). - **Destructive:** `--spark-danger` bg, white text, red shadow. - `:hover` lift `-1px`; `:disabled` opacity `.45`, `not-allowed`. ### 4.8 Secondary / small button - `--spark-bg-elevated`, `1px --spark-border`, `--radius-sm`, sans `0.74rem`/600, `--spark-text-2`. Hover → `--spark-bg-hover` + `--spark-text-1`. ### 4.9 Status indicator - Chip: pill, `--spark-bg-card`, hairline border, mono `0.68rem`, `--spark-text-3`, with leading dot. - Dot states: neutral `--spark-text-3`; success → `--spark-success` + glow; loading → `--spark-warning` + `pulse-dot`; error → `--spark-danger`. - Semantic tag (e.g. a feature flag): tinted pill in the matching semantic color at low alpha. ### 4.10 Overlay & loader - Floating overlay element: absolute pill, `backdrop-filter: blur(12px) saturate(1.4)`, translucent dark bg, `1px` white-10 border, mono `0.68rem`. - Full-cover loading overlay: `rgba(8,9,12,.88)`, centered, with spinner + label. - Spinner: ring with `--spark-accent` top border on a faint track, `spin .9s` (10px inline variant available). ### 4.11 Message bubble - Container: scrollable, panel surface, `--radius-lg`, flex column `gap: 16px`. - Bubble: `--radius-md`, `0.88rem`, `white-space: pre-wrap`, max-width `88%`. - **Own/user:** `--spark-accent` bg, white, clipped bottom-right corner (`4px`), right-aligned. - **Other/assistant:** `--spark-bg-elevated`, hairline border, clipped bottom-left, left-aligned. - Role caption: mono uppercase `0.62rem`, `--spark-text-3`. - Collapsible detail block: left accent border, `--spark-bg-input`, muted `0.78rem`; toggle via a `collapsed` class (`display:none`) with an accent-light label. - Streaming indicator: three 6px accent-light dots with staggered `pulse-dot` (delays `0 / 0.2s / 0.4s`). ### 4.12 Banner - Semantic-tinted: e.g. error uses `rgba(248,113,113,0.1)` bg, `1px rgba(248,113,113,0.2)` border, `--spark-danger` text, `--radius-sm`. Swap the color token for warning/success/info equivalents. ### 4.13 Utilities - `.hidden` → `display: none !important`. --- ## 5. Usage rules - Drive every surface, border, accent, and shadow from `--spark-*` tokens — no raw hex in components. - Mono for metadata (labels, badges, stats, status, machine output); sans for everything human-facing. - Follow the elevation ladder: root → input → panel → card → elevated → hover. - Pair hover lift with the composed `card + glow` shadow; reset transform on `:active`. - Stay within the documented radii set, container max-widths, and breakpoints. - Keep the optional grid texture near-invisible.
mac-mini-specs
Device: - Model: Mac mini (2023) - Manufacturer: Apple Inc. Chip & Performance: - Chip: Apple M2 Pro - CPU: Up to 12-core (8 performance + 4 efficiency) - GPU: Up to 19-core - Neural Engine: 16-core Memory: - RAM: 16 GB unified memory - Memory Bandwidth: 200 GB/s - Max Memory: 32 GB Storage: - Base: 512 GB SSD - Max: Up to 8 TB SSD - Type: NVMe PCIe SSD Operating System: - macOS Tahoe 26.5 (Build 25F71) - Architecture: ARM64 (Apple Silicon) Connectivity: - Wi-Fi 6E (802.11ax) - Bluetooth 5.3 Ports: - 4 × Thunderbolt 4 / USB-C - 2 × USB-A - 1 × HDMI - 1 × Gigabit Ethernet (10Gb optional) - 1 × 3.5 mm headphone jack - 1 × Power port Display Support: - Up to 3 external displays - Thunderbolt: up to 6K @ 60Hz (×2) - HDMI: up to 4K @ 60Hz (×1) Media Engine: - H.264 - HEVC - ProRes - ProRes RAW Physical: - Height: 3.58 cm - Width: 19.7 cm - Depth: 19.7 cm - Weight: ~1.28 kg Power: - Max consumption: ~185 W Release: - January 2023
qwen3-5-coordinates-system
--- title: Qwen3.5 Coordinate System description: Relative coordinate system for bounding box outputs in Qwen3.5, normalized to a 0-1000 range. tags: [reference, vision, coordinates, qwen] --- # Qwen3.5 Coordinate System ## Overview Qwen3.5 uses a **relative coordinate system** for bounding box outputs, normalized to a `0–1000` range. This is a notable shift from earlier models like Qwen2.5-VL, which used absolute pixel coordinates tied to the actual image dimensions. The key idea is simple: no matter what size your input image is, the model always describes locations as if the image were mapped onto a **1000 × 1000 grid**. --- ## How Normalization Works All coordinate values are expressed relative to a standardized 1000 × 1000 resolution. | Coordinate | Meaning | |------------|---------| | `(0, 0)` | Top-left corner of the image | | `(1000, 1000)` | Bottom-right corner of the image | | `(500, 500)` | Center of the image | This means a bounding box of `(0, 0, 1000, 1000)` covers the **entire image**, regardless of whether the original image is 640×480, 1920×1080, or any other resolution. ### Converting to Actual Pixel Coordinates To map the normalized coordinates back to real pixel positions, scale them using the original image dimensions: ``` x_pixel = x_normalized × (actual_width / 1000) y_pixel = y_normalized × (actual_height / 1000) ``` **Example:** If the model outputs `(250, 400)` and your image is `1920 × 1080`: - `x_pixel = 250 × (1920 / 1000) = 480` - `y_pixel = 400 × (1080 / 1000) = 432` --- ## Output Format Bounding boxes are returned as **JSON**. Each detected object is an object containing a `bbox_2d` array and a `label`: ```json {"bbox_2d": [x1, y1, x2, y2], "label": "object name"} ``` When multiple objects are requested, the model returns a JSON array of these objects: ```json [ {"bbox_2d": [x1, y1, x2, y2], "label": "object name"}, {"bbox_2d": [x1, y1, x2, y2], "label": "object name"} ] ``` | Component | Description | |-----------|-------------| | `bbox_2d[0], bbox_2d[1]` (`x1, y1`) | **Top-left** corner of the bounding box | | `bbox_2d[2], bbox_2d[3]` (`x2, y2`) | **Bottom-right** corner of the bounding box | | `label` | The class or description of the detected object | > Some prompts may also request an optional `sub_label` field for extra attributes. Point-based grounding uses `point_2d: [x, y]` instead of `bbox_2d`. ### Example ```json {"bbox_2d": [120, 230, 560, 780], "label": "cat"} ``` This describes a region starting at 12% from the left and 23% from the top, extending to 56% across and 78% down the image. --- ## Why This Matters - **Resolution-agnostic:** The model doesn't need you to track the input image's actual size for its output scale. Outputs are always on the same 0–1000 scale, making them easy to work with across different image sizes. - **Consistent post-processing:** You always apply the same conversion formula, regardless of the input. - **Decoupled from input handling:** Qwen3.5 processes images at native/variable resolution (dimensions are resized to the nearest multiple of 32, preserving aspect ratio), while object locations are always reported in the normalized 1000-scale space. This separation is what makes the coordinate system a natural fit for 2D spatial grounding and object detection. --- ## Quick Reference: Post-Processing ### Python ```python def normalize_to_pixel(box, image_width, image_height): """Convert Qwen3.5 normalized coordinates to pixel coordinates.""" x1, y1, x2, y2 = box return ( x1 * image_width / 1000, y1 * image_height / 1000, x2 * image_width / 1000, y2 * image_height / 1000, ) # Example usage bbox_normalized = (120, 230, 560, 780) bbox_pixels = normalize_to_pixel(bbox_normalized, 1920, 1080) # → (230.4, 248.4, 1075.2, 842.4) ``` ### TypeScript ```typescript type BBox = [number, number, number, number]; // [x1, y1, x2, y2] function normalizeToPixel( box: BBox, imageWidth: number, imageHeight: number ): BBox { const [x1, y1, x2, y2] = box; return [ (x1 * imageWidth) / 1000, (y1 * imageHeight) / 1000, (x2 * imageWidth) / 1000, (y2 * imageHeight) / 1000, ]; } // Example usage const bboxNormalized: BBox = [120, 230, 560, 780]; const bboxPixels = normalizeToPixel(bboxNormalized, 1920, 1080); // → [230.4, 248.4, 1075.2, 842.4] ``` You can parse the raw JSON output from the model. The model often wraps its output in a Markdown code fence, so strip that before parsing: ```typescript interface Detection { bbox_2d: [number, number, number, number]; label: string; sub_label?: string; } function parseQwenDetections(raw: string): Detection[] { // Remove an optional ```json ... ``` code fence, then parse as JSON. const cleaned = raw .trim() .replace(/^```(?:json)?/m, "") .replace(/```$/m, "") .trim(); return JSON.parse(cleaned) as Detection[]; } // Example usage const detections = parseQwenDetections( '[{"bbox_2d": [120, 230, 560, 780], "label": "cat"}]' ); // → [{ bbox_2d: [120, 230, 560, 780], label: "cat" }] ```
ahmad-waqar-resume
{ "picture": { "hidden": true, "url": "https://storage.rxresu.me/cm8q5r0c41geqjee2hk4zw63y/pictures/ei7ht1c97cpbiearuw2sg6jd.jpg", "size": 110, "rotation": 0, "aspectRatio": 1.2, "borderRadius": 99, "borderColor": "rgba(0, 0, 0, 0)", "borderWidth": 0, "shadowColor": "rgba(0, 0, 0, 0.5)", "shadowWidth": 0 }, "basics": { "name": "Ahmad Waqar", "headline": "Lead QA Engineer | AI Systems & Computer Vision | Agentic Test Automation Architect", "email": "khawjaahmad@gmail.com", "phone": "+971 55 970 6595", "location": "Dubai, United Arab Emirates", "website": { "url": "https://ahmadwaqar.dev", "label": "" }, "customFields": [ { "id": "019c796b-4008-75dc-b573-55e811fee1e6", "icon": "", "text": "linkedin.com/in/khawjaahmad", "link": "https://www.linkedin.com/in/khawjaahmad" }, { "id": "019c796c-4175-70e6-99c9-0cccc3362642", "icon": "", "text": "github.com/khawjaahmad", "link": "https://github.com/khawjaahmad" } ] }, "summary": { "title": "Summary", "columns": 1, "hidden": false, "content": "<p>Lead QA Engineer with 12+ years architecting scalable test automation frameworks, specializing in AI-driven testing systems using computer vision, agentic workflows, and LLMs. Currently building vision-based automation with LangGraph and Qwen3-VL for Mercedes MBUX/AAOS infotainment at FYI, leading a cross-platform QA team across mobile, web, and automotive. Proven track record of reducing regression cycles by 60%, achieving 85%+ automation coverage, and mentoring 40+ professionals across 12+ countries. UAE Golden Visa holder and Top 50 ADPList Mentor.</p>" }, "sections": { "profiles": { "title": "Profiles", "columns": 3, "hidden": true, "items": [ { "id": "clz8r9x3a0010qwer123466", "hidden": false, "icon": "", "iconColor": "", "network": "LinkedIn", "username": "khawjaahmad", "website": { "url": "https://www.linkedin.com/in/khawjaahmad/", "label": "khawjaahmad", "inlineLink": false } }, { "id": "tg9oalfonumephhqx2cgghpn", "hidden": false, "icon": "", "iconColor": "", "network": "GitHub", "username": "khawjaahmad", "website": { "url": "https://github.com/khawjaahmad", "label": "", "inlineLink": false } }, { "id": "vhxqim5r1kxo8qlncx81eb3s", "hidden": false, "icon": "", "iconColor": "", "network": "ADPList", "username": "ahmad-waqar", "website": { "url": "https://adplist.org/mentors/ahmad-waqar", "label": "ADPlist", "inlineLink": false } } ] }, "experience": { "title": "Experience", "columns": 1, "hidden": false, "items": [ { "id": "cm9x1y2z3a0000qwer123470", "hidden": false, "company": "FYI.AI", "position": "Lead QA Engineer - AI Systems & Platform Automation", "location": "Dubai, United Arab Emirates", "period": "07/2025 - Present", "website": { "url": "", "label": "", "inlineLink": false }, "description": "<p><em>Python, TypeScript, Appium, LangGraph, LangChain, Qwen3-VL:30b, Moondream, EasyOCR, Ollama, Next.js, K6, Docker</em></p><ul><li><p>Architected <strong>vision-based automation</strong> using LangGraph agentic workflows with perception, executor, and verification nodes orchestrated via LangChain, leveraging <strong>Qwen3-VL:30b</strong> via Ollama for UI element detection and <strong>Moondream</strong> for action verification across canvas-based and native app UIs.</p></li><li><p>Leading QA team across iOS, Android, Web, and <strong>Mercedes MBUX/AAOS</strong> automotive platforms, managing sprint planning, releases, and cross-functional collaboration in Agile sprints.</p></li><li><p>Developing automotive testing framework leveraging computer vision, template matching, and intent simulation for in-vehicle infotainment systems.</p></li></ul>", "roles": [] }, { "id": "clz8r9x3a0004qwer123460", "hidden": false, "company": "Tradeling", "position": "Senior QA Automation Engineer - Mobile & Web", "location": "Dubai, United Arab Emirates", "period": "11/2023 - 04/2025", "website": { "url": "", "label": "", "inlineLink": false }, "description": "<p><em>Flutter (Dart), Patrol, Playwright, K6 (TypeScript), GitHub Actions, Firebase Test Lab</em></p><ul><li><p>Built Dart-based Flutter test framework using Patrol with CI/CD via GitHub Actions and Firebase Test Lab, replacing React Native automation.</p></li><li><p>Architected Apex UI Engine that auto-validates dynamic UI against API data, catching <strong>25+ critical</strong> data inconsistencies pre-release.</p></li><li><p>Migrated Rest Assured to K6 TypeScript, unifying functional, load, and browser testing with Grafana Cloud, cutting cycle time by <strong>60%</strong>.</p></li></ul>", "roles": [] }, { "id": "clz8r9x3a0005qwer123461", "hidden": false, "company": "Malwarebytes", "position": "Senior Software Development Engineer in Test", "location": "Tallinn, Estonia", "period": "07/2023 - 10/2023", "website": { "url": "", "label": "", "inlineLink": false }, "description": "<ul><li><p>Automated Android app using Jetpack Compose tests (ComposeTestRules), ensuring critical workflows across multiple device configurations.</p></li><li><p>Achieved <strong>95% test coverage</strong> for licensing workflows via Playwright UI automation for internal back-office platform.</p></li><li><p>Implemented test orchestration in Azure DevOps CI/CD pipeline, improving traceability and enabling structured cross-team collaboration.</p></li></ul>", "roles": [] } ] }, "education": { "title": "Education", "columns": 3, "hidden": false, "items": [ { "id": "clz8r9x3a0002qwer123458", "hidden": false, "school": "The University of Lahore", "degree": "", "area": "MS in Computer Science with Specialisation in Theory of Computation", "grade": "", "location": "", "period": "", "website": { "url": "", "label": "", "inlineLink": false }, "description": "<p>2014 - 2016</p>" }, { "id": "clz8r9x3a0003qwer123459", "hidden": false, "school": "Forman Christian College University", "degree": "", "area": "BS in Computer Science with Majors in Software Engineering", "grade": "", "location": "", "period": "", "website": { "url": "", "label": "", "inlineLink": false }, "description": "<p>2010 - 2013</p>" } ] }, "projects": { "title": "Projects", "columns": 3, "hidden": false, "items": [ { "id": "gjs1x0w3zkcvuytpxv4mtinb", "hidden": false, "name": "Vision-Based Mobile Automation", "period": "2025", "website": { "url": "", "label": "", "inlineLink": false }, "description": "<p>Open-source agentic testing framework using LangGraph with LangChain, Qwen3-VL:30b via Ollama for perception, and Moondream for verification — automating canvas-based and native app UIs across platforms</p>" }, { "id": "d4e5nhpwo9jqr7ms1k2txvy3", "hidden": false, "name": "Cypress Test Data Generator Plugin", "period": "2025", "website": { "url": "https://www.npmjs.com/package/cypress-test-data-generator", "label": "npm", "inlineLink": false }, "description": "<p>Published Cypress plugin for dynamic test data generation with <strong>19K+ monthly downloads</strong>, listed on the official Cypress plugins directory and actively maintained</p>" }, { "id": "f8qzjl9c5t7rkb3v2h6yimxa", "hidden": false, "name": "API Automation Framework - K6", "period": "2024", "website": { "url": "", "label": "", "inlineLink": false }, "description": "<p>TypeScript-based unified testing framework for functional, load, and browser testing with Grafana Cloud dashboards and GitHub Actions CI/CD integration</p>" }, { "id": "nxj8o76uptvmkei7g95wdlq2", "hidden": true, "name": "Mobile App Deep Link Validator", "period": "2024", "website": { "url": "", "label": "", "inlineLink": false }, "description": "<p>iOS & Android app validation system with intelligent parallel cross-platform testing</p>" }, { "id": "gqqjyk3yume7aigh3qkif7b6", "hidden": true, "name": "3D Graph Visualizer", "period": "2024", "website": { "url": "", "label": "", "inlineLink": false }, "description": "<p>Built glowing, interactive 3D graph with smooth animations and WebGL rendering</p>" }, { "id": "puxum82lmgg9dgxdjpgkkpj6", "hidden": true, "name": "Cypress Plugin - Test Data Generator", "period": "2024", "website": { "url": "", "label": "", "inlineLink": false }, "description": "<p>A Cypress plugin for generating dynamic test data to streamline E2E testing</p>" } ] }, "skills": { "title": "Skills", "columns": 2, "hidden": false, "items": [ { "id": "clz8r9x3a0016qwer123472", "hidden": false, "icon": "", "iconColor": "", "name": "Test Automation", "proficiency": "", "level": 0, "keywords": [ "Playwright", "Appium", "Selenium", "Cypress", "Flutter Patrol", "K6", "Espresso", "XCUITest", "Postman", "Agent Browser", "Browser-Use", "Browserbase", "Steel.dev", "Appium Device Farm" ] }, { "id": "clz8r9x3a0021qwer123477", "hidden": false, "icon": "", "iconColor": "", "name": "AI/ML & Computer Vision", "proficiency": "", "level": 0, "keywords": [ "YOLO", "Moondream", "Qwen3-VL", "Qwen3.5-VL", "SmolVLM2", "CLIP", "EasyOCR", "Qwen3-TTS", "XTTSv2", "Fish Audio S1 Mini", "Chatterbox Turbo" ] }, { "id": "clz8r9x3a0015qwer123471", "hidden": false, "icon": "", "iconColor": "", "name": "Programming Languages", "proficiency": "", "level": 0, "keywords": [ "TypeScript", "Python", "JavaScript", "Java", "Kotlin", "Dart", "Swift" ] }, { "id": "clz8r9x3a0022qwer123478", "hidden": false, "icon": "", "iconColor": "", "name": "Agentic AI & LLM Infrastructure", "proficiency": "", "level": 0, "keywords": [ "LangChain", "LangGraph", "Claude Agent SDK", "Vercel AI SDK", "SmolAgents SDK", "Ollama", "llama.cpp", "vLLM" ] }, { "id": "clz8r9x3a0023qwer123479", "hidden": false, "icon": "", "iconColor": "", "name": "AI Testing & Evaluation", "proficiency": "", "level": 0, "keywords": [ "DeepEval", "Garak", "Guardrails AI", "Langfuse", "LLM Guard", "Promptfoo" ] }, { "id": "clz8r9x3a0018qwer123474", "hidden": false, "icon": "", "iconColor": "", "name": "CI/CD & Cloud Testing", "proficiency": "", "level": 0, "keywords": [ "GitHub Actions", "Azure DevOps", "Docker", "Firebase Test Lab", "BrowserStack", "AWS Device Farm" ] }, { "id": "m4k9r2p7j8s5t3q1n6v0wxyz", "hidden": true, "icon": "", "iconColor": "", "name": "Leadership & Collaboration", "proficiency": "", "level": 0, "keywords": [ "Team Leadership", "Test Strategy", "Technical Mentoring", "Sprint Planning", "Release Management", "Cross-functional Collaboration" ] }, { "id": "p2zefbp6hgzxwomy4k10sxim", "hidden": true, "icon": "", "iconColor": "", "name": "Project Management", "proficiency": "", "level": 0, "keywords": [ "JIRA Administration", "Agile Methodologies", "Test Planning", "Resource Allocation", "Risk Management", "Quality Metrics" ] } ] }, "languages": { "title": "Languages", "columns": 1, "hidden": true, "items": [] }, "interests": { "title": "Interests", "columns": 1, "hidden": true, "items": [] }, "awards": { "title": "Awards", "columns": 3, "hidden": false, "items": [ { "id": "clz8r9x3a0000qwer123456", "hidden": false, "title": "UAE Golden Visa (10-Year)", "awarder": "United Arab Emirates", "date": "2022", "website": { "url": "", "label": "", "inlineLink": false }, "description": "<p>Granted as an exceptional talent in Software Engineering under the UAE's long-term residency program for skilled professionals contributing to the country's technology and innovation ecosystem</p>" }, { "id": "clz8r9x3a0001qwer123457", "hidden": false, "title": "Outstanding Performance Award", "awarder": "Mashreq Bank", "date": "2020", "website": { "url": "", "label": "", "inlineLink": false }, "description": "<p>Recognized for building unified mobile/web automation framework and automating payment flow validation across digital channels</p>" }, { "id": "a1b2c3d4e5f6g7h8i9j0klmn", "hidden": false, "title": "Top 50 Mentor", "awarder": "ADPList", "date": "2024", "website": { "url": "", "label": "", "inlineLink": false }, "description": "<p>Mentored 40+ professionals across 12+ countries on test automation architecture, framework design, and career growth in QA engineering</p>" } ] }, "certifications": { "title": "Certifications", "columns": 2, "hidden": true, "items": [ { "id": "cert001adplist50mentor", "hidden": false, "title": "100 Mentorship Minutes", "issuer": "ADPList", "date": "2024", "website": { "url": "", "label": "", "inlineLink": false }, "description": "" }, { "id": "cert002apidesigner", "hidden": false, "title": "API Designer", "issuer": "", "date": "", "website": { "url": "", "label": "", "inlineLink": false }, "description": "" }, { "id": "cert003apiproductmgr", "hidden": false, "title": "API Product Manager", "issuer": "", "date": "", "website": { "url": "", "label": "", "inlineLink": false }, "description": "" } ] }, "publications": { "title": "Publications", "columns": 1, "hidden": true, "items": [] }, "volunteer": { "title": "Volunteering", "columns": 1, "hidden": true, "items": [ { "id": "vol001adplistmentor", "hidden": false, "organization": "ADPList", "location": "Remote", "period": "12/2023 - Present", "website": { "url": "https://adplist.org/mentors/ahmad-waqar", "label": "", "inlineLink": false }, "description": "<p>QA Automation Mentor — Mentoring QA professionals worldwide on test automation architecture, framework design, mobile testing strategies, and career transitions into QA automation.</p>" } ] }, "references": { "title": "References", "columns": 1, "hidden": true, "items": [] } }, "customSections": [ { "title": "", "columns": 1, "hidden": false, "id": "custom-experience-page2", "type": "experience", "items": [ { "id": "clz8r9x3a0006qwer123462", "hidden": false, "company": "Ekar", "position": "QA Automation Lead", "location": "Dubai, United Arab Emirates", "period": "03/2021 - 05/2023", "website": { "url": "", "label": "", "inlineLink": false }, "description": "<ul><li><p>Developed cross-platform mobile testing architecture using Appium with <strong>85% automation coverage</strong>, mentoring a team of 5 QA engineers.</p></li><li><p>Built cloud testing infrastructure with SauceLabs supporting 15+ device configurations for parallel validation.</p></li><li><p>Led enterprise API testing framework using JavaScript, automating <strong>72+ endpoints</strong> and reducing regression time from days to hours.</p></li></ul>", "roles": [] }, { "id": "clz8r9x3a0007qwer123463", "hidden": false, "company": "Mashreq Bank", "position": "Sr. QA Automation Engineer - Microservices", "location": "Dubai, United Arab Emirates", "period": "09/2019 - 02/2021", "website": { "url": "", "label": "", "inlineLink": false }, "description": "<ul><li><p>Built unified mobile/web automation framework (Appium, Selenium) for digital banking with Azure DevOps CI/CD integration.</p></li><li><p>Automated payment flow validation across KYC, UB Tablet, and digital channels, earning <strong>Outstanding Performance Award</strong>.</p></li></ul>", "roles": [] }, { "id": "wvg23f4ds0x0ujp7q0vjmsrd", "hidden": false, "company": "Emirates Islamic", "position": "Business Test Analyst", "location": "Dubai, United Arab Emirates", "period": "07/2018 - 09/2019", "website": { "url": "", "label": "", "inlineLink": false }, "description": "<ul><li><p>Led functional testing for retail asset digitalisation (Siebel CRM, Finacle) with backend API testing using SoapUI across BPM, DMS, and Finnone integrations.</p></li></ul>", "roles": [] }, { "id": "prnbcelksveag3qiirte02to", "hidden": false, "company": "Macrosoft", "position": "Software QA Engineer", "location": "Lahore, Pakistan", "period": "12/2013 - 01/2018", "website": { "url": "", "label": "", "inlineLink": false }, "description": "<ul><li><p>Built Selenium/TestNG automation framework for web applications; executed performance and API testing using JMeter and SoapUI across enterprise modules over 4+ years.</p></li></ul>", "roles": [] } ] } ], "metadata": { "template": "kakuna", "layout": { "sidebarWidth": 35, "pages": [ { "fullWidth": true, "main": [ "summary", "skills", "experience" ], "sidebar": [] }, { "fullWidth": true, "main": [ "custom-experience-page2", "projects", "education", "awards" ], "sidebar": [] } ] }, "page": { "gapX": 20, "gapY": 4, "marginX": 18, "marginY": 14, "format": "a4", "locale": "en-US", "hideIcons": false }, "design": { "level": { "icon": "star", "type": "circle" }, "colors": { "primary": "rgba(8, 145, 178, 1)", "text": "rgba(0, 0, 0, 1)", "background": "rgba(255, 255, 255, 1)" } }, "typography": { "body": { "fontFamily": "Lora", "fontWeights": [ "400" ], "fontSize": 10.2, "lineHeight": 1.32 }, "heading": { "fontFamily": "Lora", "fontWeights": [ "700" ], "fontSize": 13, "lineHeight": 1.35 } }, "notes": "" } }
agent-browser
# agent-browser Browser automation CLI for AI agents. Fast native Rust CLI. ## Installation ### Global Installation (recommended) Installs the native Rust binary: ```bash npm install -g agent-browser agent-browser install # Download Chrome from Chrome for Testing (first time only) ``` ### Project Installation (local dependency) For projects that want to pin the version in `package.json`: ```bash npm install agent-browser agent-browser install ``` Then use via `package.json` scripts or by invoking `agent-browser` directly. ### Homebrew (macOS) ```bash brew install agent-browser agent-browser install # Download Chrome from Chrome for Testing (first time only) ``` ### Cargo (Rust) ```bash cargo install agent-browser agent-browser install # Download Chrome from Chrome for Testing (first time only) ``` ### From Source ```bash git clone https://github.com/vercel-labs/agent-browser cd agent-browser pnpm install pnpm build pnpm build:native # Requires Rust (https://rustup.rs) pnpm link --global # Makes agent-browser available globally agent-browser install ``` ### Linux Dependencies On Linux, install system dependencies: ```bash agent-browser install --with-deps ``` ### Updating Upgrade to the latest version: ```bash agent-browser upgrade ``` Detects your installation method (npm, Homebrew, or Cargo) and runs the appropriate update command automatically. ### Requirements - **Chrome** - Run `agent-browser install` to download Chrome from [Chrome for Testing](https://developer.chrome.com/blog/chrome-for-testing/) (Google's official automation channel). Existing Chrome, Brave, Playwright, and Puppeteer installations are detected automatically. No Playwright or Node.js required for the daemon. - **Rust** - Only needed when building from source (see From Source above). ## Quick Start ```bash agent-browser open example.com agent-browser snapshot # Get accessibility tree with refs agent-browser click @e2 # Click by ref from snapshot agent-browser fill @e3 "test@example.com" # Fill by ref agent-browser get text @e1 # Get text by ref agent-browser screenshot page.png agent-browser close ``` ### Traditional Selectors (also supported) ```bash agent-browser click "#submit" agent-browser fill "#email" "test@example.com" agent-browser find role button click --name "Submit" ``` ## Commands ### Core Commands ```bash agent-browser open # Launch browser (no navigation); stays on about:blank agent-browser open <url> # Launch + navigate to URL (aliases: goto, navigate) agent-browser click <sel> # Click element (--new-tab to open in new tab) agent-browser dblclick <sel> # Double-click element agent-browser focus <sel> # Focus element agent-browser type <sel> <text> # Type into element agent-browser fill <sel> <text> # Clear and fill agent-browser press <key> # Press key (Enter, Tab, Control+a) (alias: key) agent-browser keyboard type <text> # Type with real keystrokes (no selector, current focus) agent-browser keyboard inserttext <text> # Insert text without key events (no selector) agent-browser keydown <key> # Hold key down agent-browser keyup <key> # Release key agent-browser hover <sel> # Hover element agent-browser select <sel> <val> # Select dropdown option agent-browser check <sel> # Check checkbox agent-browser uncheck <sel> # Uncheck checkbox agent-browser scroll <dir> [px] # Scroll (up/down/left/right, --selector <sel>) agent-browser scrollintoview <sel> # Scroll element into view (alias: scrollinto) agent-browser drag <src> <tgt> # Drag and drop agent-browser upload <sel> <files> # Upload files agent-browser screenshot [path] # Take screenshot (--full for full page, saves to a temporary directory if no path) agent-browser screenshot --annotate # Annotated screenshot with numbered element labels agent-browser screenshot --screenshot-dir ./shots # Save to custom directory agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80 agent-browser pdf <path> # Save as PDF agent-browser snapshot # Accessibility tree with refs (best for AI) agent-browser eval <js> # Run JavaScript (-b for base64, --stdin for piped input) agent-browser connect <port> # Connect to browser via CDP agent-browser stream enable [--port <port>] # Start runtime WebSocket streaming agent-browser stream status # Show runtime streaming state and bound port agent-browser stream disable # Stop runtime WebSocket streaming agent-browser close # Close browser (aliases: quit, exit) agent-browser close --all # Close all active sessions agent-browser chat "<instruction>" # AI chat: natural language browser control (single-shot) agent-browser chat # AI chat: interactive REPL mode ``` ### Get Info ```bash agent-browser get text <sel> # Get text content agent-browser get html <sel> # Get innerHTML agent-browser get value <sel> # Get input value agent-browser get attr <sel> <attr> # Get attribute agent-browser get title # Get page title agent-browser get url # Get current URL agent-browser get cdp-url # Get CDP WebSocket URL (for DevTools, debugging) agent-browser get count <sel> # Count matching elements agent-browser get box <sel> # Get bounding box agent-browser get styles <sel> # Get computed styles ``` ### Check State ```bash agent-browser is visible <sel> # Check if visible agent-browser is enabled <sel> # Check if enabled agent-browser is checked <sel> # Check if checked ``` ### Find Elements (Semantic Locators) ```bash agent-browser find role <role> <action> [value] # By ARIA role agent-browser find text <text> <action> # By text content agent-browser find label <label> <action> [value] # By label agent-browser find placeholder <ph> <action> [value] # By placeholder agent-browser find alt <text> <action> # By alt text agent-browser find title <text> <action> # By title attr agent-browser find testid <id> <action> [value] # By data-testid agent-browser find first <sel> <action> [value] # First match agent-browser find last <sel> <action> [value] # Last match agent-browser find nth <n> <sel> <action> [value] # Nth match ``` **Actions:** `click`, `fill`, `type`, `hover`, `focus`, `check`, `uncheck`, `text` **Options:** `--name <name>` (filter role by accessible name), `--exact` (require exact text match) **Examples:** ```bash agent-browser find role button click --name "Submit" agent-browser find text "Sign In" click agent-browser find label "Email" fill "test@test.com" agent-browser find first ".item" click agent-browser find nth 2 "a" text ``` ### Wait ```bash agent-browser wait <selector> # Wait for element to be visible agent-browser wait <ms> # Wait for time (milliseconds) agent-browser wait --text "Welcome" # Wait for text to appear (substring match) agent-browser wait --url "**/dash" # Wait for URL pattern agent-browser wait --load networkidle # Wait for load state agent-browser wait --fn "window.ready === true" # Wait for JS condition # Wait for text/element to disappear agent-browser wait --fn "!document.body.innerText.includes('Loading...')" agent-browser wait "#spinner" --state hidden ``` **Load states:** `load`, `domcontentloaded`, `networkidle` ### Batch Execution Execute multiple commands in a single invocation. Commands can be passed as quoted arguments or piped as JSON via stdin. This avoids per-command process startup overhead when running multi-step workflows. ```bash # Argument mode: each quoted argument is a full command agent-browser batch "open https://example.com" "snapshot -i" "screenshot" # With --bail to stop on first error agent-browser batch --bail "open https://example.com" "click @e1" "screenshot" # Stdin mode: pipe commands as JSON echo '[ ["open", "https://example.com"], ["snapshot", "-i"], ["click", "@e1"], ["screenshot", "result.png"] ]' | agent-browser batch --json ``` ### Clipboard ```bash agent-browser clipboard read # Read text from clipboard agent-browser clipboard write "Hello, World!" # Write text to clipboard agent-browser clipboard copy # Copy current selection (Ctrl+C) agent-browser clipboard paste # Paste from clipboard (Ctrl+V) ``` ### Mouse Control ```bash agent-browser mouse move <x> <y> # Move mouse agent-browser mouse down [button] # Press button (left/right/middle) agent-browser mouse up [button] # Release button agent-browser mouse wheel <dy> [dx] # Scroll wheel ``` ### Browser Settings ```bash agent-browser set viewport <w> <h> [scale] # Set viewport size (scale for retina, e.g. 2) agent-browser set device <name> # Emulate device ("iPhone 14") agent-browser set geo <lat> <lng> # Set geolocation agent-browser set offline [on|off] # Toggle offline mode agent-browser set headers <json> # Extra HTTP headers agent-browser set credentials <u> <p> # HTTP basic auth agent-browser set media [dark|light] # Emulate color scheme ``` ### Cookies & Storage ```bash agent-browser cookies # Get all cookies agent-browser cookies set <name> <val> # Set cookie agent-browser cookies set --curl <file> # Import cookies from a Copy-as-cURL dump, # JSON array, or bare Cookie header (auto-detected) agent-browser cookies clear # Clear cookies agent-browser storage local # Get all localStorage agent-browser storage local <key> # Get specific key agent-browser storage local set <k> <v> # Set value agent-browser storage local clear # Clear all agent-browser storage session # Same for sessionStorage ``` ### Network ```bash agent-browser network route <url> # Intercept requests agent-browser network route <url> --abort # Block requests agent-browser network route <url> --body <json> # Mock response agent-browser network route '*' --abort --resource-type script # Block scripts only agent-browser network unroute [url] # Remove routes agent-browser network requests # View tracked requests agent-browser network requests --filter api # Filter requests agent-browser network requests --type xhr,fetch # Filter by resource type agent-browser network requests --method POST # Filter by HTTP method agent-browser network requests --status 2xx # Filter by status (200, 2xx, 400-499) agent-browser network request <requestId> # View full request/response detail agent-browser network har start # Start HAR recording agent-browser network har stop [output.har] # Stop and save HAR (temp path if omitted) ``` ### Tabs & Windows ```bash agent-browser tab # List tabs (shows `tabId` and optional label) agent-browser tab new [url] # New tab (optionally with URL) agent-browser tab new --label docs [url] # New tab with a user-assigned label agent-browser tab <t<N>|label> # Switch to a tab by id or label agent-browser tab close [t<N>|label] # Close a tab (defaults to active) agent-browser window new # New window ``` Tab ids are stable strings of the form `t1`, `t2`, `t3`. They're never reused within a session, so scripts and agents can keep referring to the same tab even after other tabs are opened or closed. Positional integers like `tab 2` are **not** accepted; the `t` prefix disambiguates handles from indices and mirrors the `@e1` convention used for element refs. You can also assign a memorable label (`docs`, `app`, `admin`) and use it interchangeably with the id. Labels are never auto-generated and never rewritten on navigation — they're yours to name and keep: ```bash agent-browser tab new --label docs https://docs.example.com agent-browser tab docs # switch to the docs tab agent-browser snapshot # populate refs for docs agent-browser click @e3 # click uses docs's refs agent-browser tab close docs # close by label ``` ### Frames ```bash agent-browser frame <sel> # Switch to iframe agent-browser frame main # Back to main frame ``` ### Dialogs ```bash agent-browser dialog accept [text] # Accept (with optional prompt text) agent-browser dialog dismiss # Dismiss agent-browser dialog status # Check if a dialog is currently open ``` By default, `alert` and `beforeunload` dialogs are automatically accepted so they never block the agent. `confirm` and `prompt` dialogs still require explicit handling. Use `--no-auto-dialog` (or `AGENT_BROWSER_NO_AUTO_DIALOG=1`) to disable automatic handling. When a JavaScript dialog is pending, all command responses include a `warning` field with the dialog type and message. ### Diff ```bash agent-browser diff snapshot # Compare current vs last snapshot agent-browser diff snapshot --baseline before.txt # Compare current vs saved snapshot file agent-browser diff snapshot --selector "#main" --compact # Scoped snapshot diff agent-browser diff screenshot --baseline before.png # Visual pixel diff against baseline agent-browser diff screenshot --baseline b.png -o d.png # Save diff image to custom path agent-browser diff screenshot --baseline b.png -t 0.2 # Adjust color threshold (0-1) agent-browser diff url https://v1.com https://v2.com # Compare two URLs (snapshot diff) agent-browser diff url https://v1.com https://v2.com --screenshot # Also visual diff agent-browser diff url https://v1.com https://v2.com --wait-until networkidle # Custom wait strategy agent-browser diff url https://v1.com https://v2.com --selector "#main" # Scope to element ``` ### Debug ```bash agent-browser trace start [path] # Start recording trace agent-browser trace stop [path] # Stop and save trace agent-browser profiler start # Start Chrome DevTools profiling agent-browser profiler stop [path] # Stop and save profile (.json) agent-browser console # View console messages (log, error, warn, info) agent-browser console --json # JSON output with raw CDP args for programmatic access agent-browser console --clear # Clear console agent-browser errors # View page errors (uncaught JavaScript exceptions) agent-browser errors --clear # Clear errors agent-browser highlight <sel> # Highlight element agent-browser inspect # Open Chrome DevTools for the active page agent-browser state save <path> # Save auth state agent-browser state load <path> # Load auth state agent-browser state list # List saved state files agent-browser state show <file> # Show state summary agent-browser state rename <old> <new> # Rename state file agent-browser state clear [name] # Clear states for session agent-browser state clear --all # Clear all saved states agent-browser state clean --older-than <days> # Delete old states ``` ### Navigation ```bash agent-browser back # Go back agent-browser forward # Go forward agent-browser reload # Reload page agent-browser pushstate <url> # SPA client-side nav; auto-detects window.next.router.push, # falls back to history.pushState + popstate ``` ### Pre-navigation setup Some flows (SSR debug, auth cookies for protected origins, init scripts) need state set up *before* the first navigation. Use `open` with no URL to launch the browser, then stage cookies / routes / init scripts, then navigate. `batch` sends it all in one CLI call: ```bash agent-browser batch \ '["open"]' \ '["network","route","*","--abort","--resource-type","script"]' \ '["cookies","set","--curl","cookies.curl","--domain","localhost"]' \ '["navigate","http://localhost:3000/target"]' ``` Without `batch` the same sequence is three commands that all reuse the same daemon (fast, but not one turn). ### React / Web Vitals Agent-browser ships with first-class React introspection and universal Web Vitals metrics. The React commands need the React DevTools hook installed at launch; Web Vitals and pushstate are framework-agnostic. ```bash agent-browser open --enable react-devtools <url> # Launch with React hook installed agent-browser react tree # Full component tree agent-browser react inspect <fiberId> # props, hooks, state, source agent-browser react renders start # Begin fiber render recording agent-browser react renders stop [--json] # Stop and print profile (--json for raw data) agent-browser react suspense [--only-dynamic] [--json] # Suspense boundaries + classifier # --only-dynamic hides the "static" list agent-browser vitals [url] [--json] # LCP/CLS/TTFB/FCP/INP + React hydration phases ``` Each `react ...` subcommand requires `--enable react-devtools` to have been passed at launch (the React DevTools `installHook.js` is embedded in the binary). Without it the commands error with `React DevTools hook not installed - relaunch with --enable react-devtools`. Works on any React app — Next.js, Remix, Vite+React, CRA, TanStack Start, React Native Web, etc. `vitals` and `pushstate` are framework-agnostic. ### Init scripts ```bash agent-browser open --init-script <path> # Register page init script before first navigation # (repeatable; also AGENT_BROWSER_INIT_SCRIPTS env) agent-browser addinitscript <js> # Register at runtime (returns identifier) agent-browser removeinitscript <identifier> # Remove a previously registered init script ``` ### Setup ```bash agent-browser install # Download Chrome from Chrome for Testing (Google's official automation channel) agent-browser install --with-deps # Also install system deps (Linux) agent-browser upgrade # Upgrade agent-browser to the latest version agent-browser doctor # Diagnose the install and auto-clean stale daemon files agent-browser doctor --fix # Also run destructive repairs (reinstall Chrome, purge old state, ...) agent-browser doctor --offline --quick # Skip network probes and the live launch test ``` `doctor` checks your environment, Chrome install, daemon state, config files, encryption key, providers, network reachability, and runs a live headless browser launch test. Stale socket/pid sidecar files are auto-cleaned. Output is also available as `--json` for agents. ### Skills ```bash agent-browser skills # List available skills agent-browser skills list # Same as above agent-browser skills get <name> # Output a skill's full content agent-browser skills get <name> --full # Include references and templates agent-browser skills get --all # Output every skill agent-browser skills path [name] # Print skill directory path ``` Serves bundled skill content that always matches the installed CLI version. AI agents use this to get current instructions rather than relying on cached copies. Set `AGENT_BROWSER_SKILLS_DIR` to override the skills directory path. ## Authentication agent-browser provides multiple ways to persist login sessions so you don't re-authenticate every run. ### Quick summary | Approach | Best for | Flag / Env | |----------|----------|------------| | **Chrome profile reuse** | Reuse your existing Chrome login state (cookies, sessions) with zero setup | `--profile <name>` / `AGENT_BROWSER_PROFILE` | | **Persistent profile** | Full browser state (cookies, IndexedDB, service workers, cache) across restarts | `--profile <path>` / `AGENT_BROWSER_PROFILE` | | **Session persistence** | Auto-save/restore cookies + localStorage by name | `--session-name <name>` / `AGENT_BROWSER_SESSION_NAME` | | **Import from your browser** | Grab auth from a Chrome session you already logged into | `--auto-connect` + `state save` | | **State file** | Load a previously saved state JSON on launch | `--state <path>` / `AGENT_BROWSER_STATE` | | **Auth vault** | Store credentials locally (encrypted), login by name | `auth save` / `auth login` | ### Import auth from your browser If you are already logged in to a site in Chrome, you can grab that auth state and reuse it: ```bash # 1. Launch Chrome with remote debugging enabled # macOS: "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222 # Or use --auto-connect to discover an already-running Chrome # 2. Connect and save the authenticated state agent-browser --auto-connect state save ./my-auth.json # 3. Use the saved auth in future sessions agent-browser --state ./my-auth.json open https://app.example.com/dashboard # 4. Or use --session-name for automatic persistence agent-browser --session-name myapp state load ./my-auth.json # From now on, --session-name myapp auto-saves/restores this state ``` > **Security notes:** > - `--remote-debugging-port` exposes full browser control on localhost. Any local process can connect. Only use on trusted machines and close Chrome when done. > - State files contain session tokens in plaintext. Add them to `.gitignore` and delete when no longer needed. For encryption at rest, set `AGENT_BROWSER_ENCRYPTION_KEY` (see [State Encryption](#state-encryption)). For full details on login flows, OAuth, 2FA, cookie-based auth, and the auth vault, see the [Authentication](docs/src/app/sessions/page.mdx) docs. ## Sessions Run multiple isolated browser instances: ```bash # Different sessions agent-browser --session agent1 open site-a.com agent-browser --session agent2 open site-b.com # Or via environment variable AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn" # List active sessions agent-browser session list # Output: # Active sessions: # -> default # agent1 # Show current session agent-browser session ``` Each session has its own: - Browser instance - Cookies and storage - Navigation history - Authentication state ## Chrome Profile Reuse The fastest way to use your existing login state: pass a Chrome profile name to `--profile`: ```bash # List available Chrome profiles agent-browser profiles # Reuse your default Chrome profile's login state agent-browser --profile Default open https://gmail.com # Use a named profile (by display name or directory name) agent-browser --profile "Work" open https://app.example.com # Or via environment variable AGENT_BROWSER_PROFILE=Default agent-browser open https://gmail.com ``` This copies your Chrome profile to a temp directory (read-only snapshot, no changes to your original profile), so the browser launches with your existing cookies and sessions. > **Note:** On Windows, close Chrome before using `--profile <name>` if Chrome is running, as some profile files may be locked. ## Persistent Profiles For a persistent custom profile directory that stores state across browser restarts, pass a path to `--profile`: ```bash # Use a persistent profile directory agent-browser --profile ~/.myapp-profile open myapp.com # Login once, then reuse the authenticated session agent-browser --profile ~/.myapp-profile open myapp.com/dashboard # Or via environment variable AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com ``` The profile directory stores: - Cookies and localStorage - IndexedDB data - Service workers - Browser cache - Login sessions **Tip**: Use different profile paths for different projects to keep their browser state isolated. ## Session Persistence Alternatively, use `--session-name` to automatically save and restore cookies and localStorage across browser restarts: ```bash # Auto-save/load state for "twitter" session agent-browser --session-name twitter open twitter.com # Login once, then state persists automatically # State files stored in ~/.agent-browser/sessions/ # Or via environment variable export AGENT_BROWSER_SESSION_NAME=twitter agent-browser open twitter.com ``` ### State Encryption Encrypt saved session data at rest with AES-256-GCM: ```bash # Generate key: openssl rand -hex 32 export AGENT_BROWSER_ENCRYPTION_KEY=<64-char-hex-key> # State files are now encrypted automatically agent-browser --session-name secure open example.com ``` | Variable | Description | | --------------------------------- | -------------------------------------------------- | | `AGENT_BROWSER_SESSION_NAME` | Auto-save/load state persistence name | | `AGENT_BROWSER_ENCRYPTION_KEY` | 64-char hex key for AES-256-GCM encryption | | `AGENT_BROWSER_STATE_EXPIRE_DAYS` | Auto-delete states older than N days (default: 30) | ## Security agent-browser includes security features for safe AI agent deployments. All features are opt-in -- existing workflows are unaffected until you explicitly enable a feature: - **Authentication Vault** -- Store credentials locally (always encrypted), reference by name. The LLM never sees passwords. `auth login` navigates with `load` and then waits for login form selectors to appear (SPA-friendly, timeout follows the default action timeout). A key is auto-generated at `~/.agent-browser/.encryption-key` if `AGENT_BROWSER_ENCRYPTION_KEY` is not set: `echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin` then `agent-browser auth login github` - **Content Boundary Markers** -- Wrap page output in delimiters so LLMs can distinguish tool output from untrusted content: `--content-boundaries` - **Domain Allowlist** -- Restrict navigation to trusted domains (wildcards like `*.example.com` also match the bare domain): `--allowed-domains "example.com,*.example.com"`. Sub-resource requests (scripts, images, fetch) and WebSocket/EventSource connections to non-allowed domains are also blocked. Include any CDN domains your target pages depend on (e.g., `*.cdn.example.com`). - **Action Policy** -- Gate destructive actions with a static policy file: `--action-policy ./policy.json` - **Action Confirmation** -- Require explicit approval for sensitive action categories: `--confirm-actions eval,download` - **Output Length Limits** -- Prevent context flooding: `--max-output 50000` | Variable | Description | | ----------------------------------- | ---------------------------------------- | | `AGENT_BROWSER_CONTENT_BOUNDARIES` | Wrap page output in boundary markers | | `AGENT_BROWSER_MAX_OUTPUT` | Max characters for page output | | `AGENT_BROWSER_ALLOWED_DOMAINS` | Comma-separated allowed domain patterns | | `AGENT_BROWSER_ACTION_POLICY` | Path to action policy JSON file | | `AGENT_BROWSER_CONFIRM_ACTIONS` | Action categories requiring confirmation | | `AGENT_BROWSER_CONFIRM_INTERACTIVE` | Enable interactive confirmation prompts | See [Security documentation](https://agent-browser.dev/security) for details. ## Snapshot Options The `snapshot` command supports filtering to reduce output size: ```bash agent-browser snapshot # Full accessibility tree agent-browser snapshot -i # Interactive elements only (buttons, inputs, links) agent-browser snapshot -i --urls # Interactive elements with link URLs agent-browser snapshot -c # Compact (remove empty structural elements) agent-browser snapshot -d 3 # Limit depth to 3 levels agent-browser snapshot -s "#main" # Scope to CSS selector agent-browser snapshot -i -c -d 5 # Combine options ``` | Option | Description | | ---------------------- | ----------------------------------------------------------------------- | | `-i, --interactive` | Only show interactive elements (buttons, links, inputs) | | `-u, --urls` | Include href URLs for link elements | | `-c, --compact` | Remove empty structural elements | | `-d, --depth <n>` | Limit tree depth | | `-s, --selector <sel>` | Scope to CSS selector | ## Annotated Screenshots The `--annotate` flag overlays numbered labels on interactive elements in the screenshot. Each label `[N]` corresponds to ref `@eN`, so the same refs work for both visual and text-based workflows. Annotated screenshots are supported on the CDP-backed browser path (Chrome/Lightpanda). The Safari/WebDriver backend does not yet support `--annotate`. ```bash agent-browser screenshot --annotate # -> Screenshot saved to /tmp/screenshot-2026-02-17T12-00-00-abc123.png # [1] @e1 button "Submit" # [2] @e2 link "Home" # [3] @e3 textbox "Email" ``` After an annotated screenshot, refs are cached so you can immediately interact with elements: ```bash agent-browser screenshot --annotate ./page.png agent-browser click @e2 # Click the "Home" link labeled [2] ``` This is useful for multimodal AI models that can reason about visual layout, unlabeled icon buttons, canvas elements, or visual state that the text accessibility tree cannot capture. ## Options | Option | Description | |--------|-------------| | `--session <name>` | Use isolated session (or `AGENT_BROWSER_SESSION` env) | | `--session-name <name>` | Auto-save/restore session state (or `AGENT_BROWSER_SESSION_NAME` env) | | `--profile <name\|path>` | Chrome profile name or persistent directory path (or `AGENT_BROWSER_PROFILE` env) | | `--state <path>` | Load storage state from JSON file (or `AGENT_BROWSER_STATE` env) | | `--headers <json>` | Set HTTP headers scoped to the URL's origin | | `--executable-path <path>` | Custom browser executable (or `AGENT_BROWSER_EXECUTABLE_PATH` env) | | `--extension <path>` | Load browser extension (repeatable; or `AGENT_BROWSER_EXTENSIONS` env) | | `--init-script <path>` | Register a page init script before the first navigation (repeatable; or `AGENT_BROWSER_INIT_SCRIPTS` env) | | `--enable <feature>` | Built-in init scripts: `react-devtools` (repeatable or comma-list; or `AGENT_BROWSER_ENABLE` env) | | `--args <args>` | Browser launch args, comma or newline separated (or `AGENT_BROWSER_ARGS` env) | | `--user-agent <ua>` | Custom User-Agent string (or `AGENT_BROWSER_USER_AGENT` env) | | `--proxy <url>` | Proxy server URL with optional auth (or `AGENT_BROWSER_PROXY` env) | | `--proxy-bypass <hosts>` | Hosts to bypass proxy (or `AGENT_BROWSER_PROXY_BYPASS` env) | | `--ignore-https-errors` | Ignore HTTPS certificate errors (useful for self-signed certs) | | `--allow-file-access` | Allow file:// URLs to access local files (Chromium only) | | `-p, --provider <name>` | Cloud browser provider (or `AGENT_BROWSER_PROVIDER` env) | | `--device <name>` | iOS device name, e.g. "iPhone 15 Pro" (or `AGENT_BROWSER_IOS_DEVICE` env) | | `--json` | JSON output (for agents) | | `--annotate` | Annotated screenshot with numbered element labels (or `AGENT_BROWSER_ANNOTATE` env) | | `--screenshot-dir <path>` | Default screenshot output directory (or `AGENT_BROWSER_SCREENSHOT_DIR` env) | | `--screenshot-quality <n>` | JPEG quality 0-100 (or `AGENT_BROWSER_SCREENSHOT_QUALITY` env) | | `--screenshot-format <fmt>` | Screenshot format: `png`, `jpeg` (or `AGENT_BROWSER_SCREENSHOT_FORMAT` env) | | `--headed` | Show browser window (not headless) (or `AGENT_BROWSER_HEADED` env) | | `--cdp <port\|url>` | Connect via Chrome DevTools Protocol (port or WebSocket URL) | | `--auto-connect` | Auto-discover and connect to running Chrome (or `AGENT_BROWSER_AUTO_CONNECT` env) | | `--color-scheme <scheme>` | Color scheme: `dark`, `light`, `no-preference` (or `AGENT_BROWSER_COLOR_SCHEME` env) | | `--download-path <path>` | Default download directory (or `AGENT_BROWSER_DOWNLOAD_PATH` env) | | `--content-boundaries` | Wrap page output in boundary markers for LLM safety (or `AGENT_BROWSER_CONTENT_BOUNDARIES` env) | | `--max-output <chars>` | Truncate page output to N characters (or `AGENT_BROWSER_MAX_OUTPUT` env) | | `--allowed-domains <list>` | Comma-separated allowed domain patterns (or `AGENT_BROWSER_ALLOWED_DOMAINS` env) | | `--action-policy <path>` | Path to action policy JSON file (or `AGENT_BROWSER_ACTION_POLICY` env) | | `--confirm-actions <list>` | Action categories requiring confirmation (or `AGENT_BROWSER_CONFIRM_ACTIONS` env) | | `--confirm-interactive` | Interactive confirmation prompts; auto-denies if stdin is not a TTY (or `AGENT_BROWSER_CONFIRM_INTERACTIVE` env) | | `--engine <name>` | Browser engine: `chrome` (default), `lightpanda` (or `AGENT_BROWSER_ENGINE` env) | | `--no-auto-dialog` | Disable automatic dismissal of `alert`/`beforeunload` dialogs (or `AGENT_BROWSER_NO_AUTO_DIALOG` env) | | `--model <name>` | AI model for chat command (or `AI_GATEWAY_MODEL` env) | | `-v`, `--verbose` | Show tool commands and their raw output (chat) | | `-q`, `--quiet` | Show only AI text responses, hide tool calls (chat) | | `--config <path>` | Use a custom config file (or `AGENT_BROWSER_CONFIG` env) | | `--debug` | Debug output | ## Observability Dashboard Monitor agent-browser sessions in real time with a local web dashboard showing a live viewport and command activity feed. ```bash # Start the dashboard server (runs in background on port 4848) agent-browser dashboard start agent-browser dashboard start --port 8080 # Custom port # All sessions are automatically visible in the dashboard agent-browser open example.com # Stop the dashboard agent-browser dashboard stop ``` The dashboard runs as a standalone background process on port 4848, independent of browser sessions. It stays available even when no sessions are running. All sessions automatically stream to the dashboard. The dashboard displays: - **Live viewport** -- real-time JPEG frames from the browser - **Activity feed** -- chronological command/result stream with timing and expandable details - **Console output** -- browser console messages (log, warn, error) - **Session creation** -- create new sessions from the UI with local engines (Chrome, Lightpanda) or cloud providers (AgentCore, Browserbase, Browserless, Browser Use, Kernel) - **AI Chat** -- chat with an AI assistant directly in the dashboard (requires Vercel AI Gateway configuration) ### AI Chat The dashboard includes an optional AI chat panel powered by the Vercel AI Gateway. The same functionality is available directly from the CLI via the `chat` command. Set these environment variables to enable AI chat: ```bash export AI_GATEWAY_API_KEY=gw_your_key_here export AI_GATEWAY_MODEL=anthropic/claude-sonnet-4.6 # optional, this is the default export AI_GATEWAY_URL=https://ai-gateway.vercel.sh # optional, this is the default ``` **CLI usage:** ```bash agent-browser chat "open google.com and search for cats" # Single-shot agent-browser chat # Interactive REPL agent-browser -q chat "summarize this page" # Quiet mode (text only) agent-browser -v chat "fill in the login form" # Verbose (show command output) agent-browser --model openai/gpt-4o chat "take a screenshot" # Override model ``` The `chat` command translates natural language instructions into agent-browser commands, executes them, and streams the AI response. In interactive mode, type `quit` to exit. Use `--json` for structured output suitable for agent consumption. **Dashboard usage:** The Chat tab is always visible in the dashboard. When `AI_GATEWAY_API_KEY` is set, the Rust server proxies requests to the gateway and streams responses back using the Vercel AI SDK's UI Message Stream protocol. Without the key, sending a message shows an error inline. ## Configuration Create an `agent-browser.json` file to set persistent defaults instead of repeating flags on every command. **Locations (lowest to highest priority):** 1. `~/.agent-browser/config.json` -- user-level defaults 2. `./agent-browser.json` -- project-level overrides (in working directory) 3. `AGENT_BROWSER_*` environment variables override config file values 4. CLI flags override everything **Example `agent-browser.json`:** ```json { "headed": true, "proxy": "http://localhost:8080", "profile": "./browser-data", "userAgent": "my-agent/1.0", "ignoreHttpsErrors": true } ``` Use `--config <path>` or `AGENT_BROWSER_CONFIG` to load a specific config file instead of the defaults: ```bash agent-browser --config ./ci-config.json open example.com AGENT_BROWSER_CONFIG=./ci-config.json agent-browser open example.com ``` All options from the table above can be set in the config file using camelCase keys (e.g., `--executable-path` becomes `"executablePath"`, `--proxy-bypass` becomes `"proxyBypass"`). Unknown keys are ignored for forward compatibility. A [JSON Schema](agent-browser.schema.json) is available for IDE autocomplete and validation. Add a `$schema` key to your config file to enable it: ```json { "$schema": "https://agent-browser.dev/schema.json", "headed": true } ``` Boolean flags accept an optional `true`/`false` value to override config settings. For example, `--headed false` disables `"headed": true` from config. A bare `--headed` is equivalent to `--headed true`. Auto-discovered config files that are missing are silently ignored. If `--config <path>` points to a missing or invalid file, agent-browser exits with an error. Extensions from user and project configs are merged (concatenated), not replaced. > **Tip:** If your project-level `agent-browser.json` contains environment-specific values (paths, proxies), consider adding it to `.gitignore`. ## Default Timeout The default timeout for standard operations (clicks, waits, fills, etc.) is 25 seconds. This is intentionally below the CLI's 30-second IPC read timeout so that the daemon returns a proper error instead of the CLI timing out with EAGAIN. Override the default timeout via environment variable: ```bash # Set a longer timeout for slow pages (in milliseconds) export AGENT_BROWSER_DEFAULT_TIMEOUT=45000 ``` > **Note:** Setting this above 30000 (30s) may cause EAGAIN errors on slow operations because the CLI's read timeout will expire before the daemon responds. The CLI retries transient errors automatically, but response times will increase. | Variable | Description | | ------------------------------- | ---------------------------------------- | | `AGENT_BROWSER_DEFAULT_TIMEOUT` | Default operation timeout in ms (default: 25000) | ## Selectors ### Refs (Recommended for AI) Refs provide deterministic element selection from snapshots: ```bash # 1. Get snapshot with refs agent-browser snapshot # Output: # - heading "Example Domain" [ref=e1] [level=1] # - button "Submit" [ref=e2] # - textbox "Email" [ref=e3] # - link "Learn more" [ref=e4] # 2. Use refs to interact agent-browser click @e2 # Click the button agent-browser fill @e3 "test@example.com" # Fill the textbox agent-browser get text @e1 # Get heading text agent-browser hover @e4 # Hover the link ``` **Why use refs?** - **Deterministic**: Ref points to exact element from snapshot - **Fast**: No DOM re-query needed - **AI-friendly**: Snapshot + ref workflow is optimal for LLMs ### CSS Selectors ```bash agent-browser click "#id" agent-browser click ".class" agent-browser click "div > button" ``` ### Text & XPath ```bash agent-browser click "text=Submit" agent-browser click "xpath=//button" ``` ### Semantic Locators ```bash agent-browser find role button click --name "Submit" agent-browser find label "Email" fill "test@test.com" ``` ## Agent Mode Use `--json` for machine-readable output: ```bash agent-browser snapshot --json # Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}} agent-browser get text @e1 --json agent-browser is visible @e2 --json ``` ### Optimal AI Workflow ```bash # 1. Navigate and get snapshot agent-browser open example.com agent-browser snapshot -i --json # AI parses tree and refs # 2. AI identifies target refs from snapshot # 3. Execute actions using refs agent-browser click @e2 agent-browser fill @e3 "input text" # 4. Get new snapshot if page changed agent-browser snapshot -i --json ``` ### Command Chaining Commands can be chained with `&&` in a single shell invocation. The browser persists via a background daemon, so chaining is safe and more efficient: ```bash # Open, wait for load, and snapshot in one call agent-browser open example.com && agent-browser wait --load networkidle && agent-browser snapshot -i # Chain multiple interactions agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "pass" && agent-browser click @e3 # Navigate and screenshot agent-browser open example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png ``` Use `&&` when you don't need intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs before interacting). ## Headed Mode Show the browser window for debugging: ```bash agent-browser open example.com --headed ``` This opens a visible browser window instead of running headless. > **Note:** Browser extensions work in both headed and headless mode (Chrome's `--headless=new`). ## Authenticated Sessions Use `--headers` to set HTTP headers for a specific origin, enabling authentication without login flows: ```bash # Headers are scoped to api.example.com only agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}' # Requests to api.example.com include the auth header agent-browser snapshot -i --json agent-browser click @e2 # Navigate to another domain - headers are NOT sent (safe!) agent-browser open other-site.com ``` This is useful for: - **Skipping login flows** - Authenticate via headers instead of UI - **Switching users** - Start new sessions with different auth tokens - **API testing** - Access protected endpoints directly - **Security** - Headers are scoped to the origin, not leaked to other domains To set headers for multiple origins, use `--headers` with each `open` command: ```bash agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}' agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}' ``` For global headers (all domains), use `set headers`: ```bash agent-browser set headers '{"X-Custom-Header": "value"}' ``` ## Custom Browser Executable Use a custom browser executable instead of the bundled Chromium. This is useful for: - **Serverless deployment**: Use lightweight Chromium builds like `@sparticuz/chromium` (~50MB vs ~684MB) - **System browsers**: Use an existing Chrome/Chromium installation - **Custom builds**: Use modified browser builds ### CLI Usage ```bash # Via flag agent-browser --executable-path /path/to/chromium open example.com # Via environment variable AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com ``` ### Serverless (Vercel) Run agent-browser + Chrome in an ephemeral Vercel Sandbox microVM. No external server needed: ```typescript import { Sandbox } from "@vercel/sandbox"; const sandbox = await Sandbox.create({ runtime: "node24" }); await sandbox.runCommand("agent-browser", ["open", "https://example.com"]); const result = await sandbox.runCommand("agent-browser", ["screenshot", "--json"]); await sandbox.stop(); ``` See the [environments example](examples/environments/) for a working demo with a UI and deploy-to-Vercel button. ### Serverless (AWS Lambda) ```typescript import chromium from '@sparticuz/chromium'; import { execSync } from 'child_process'; export async function handler() { const executablePath = await chromium.executablePath(); const result = execSync( `AGENT_BROWSER_EXECUTABLE_PATH=${executablePath} agent-browser open https://example.com && agent-browser snapshot -i --json`, { encoding: 'utf-8' } ); return JSON.parse(result); } ``` ## Local Files Open and interact with local files (PDFs, HTML, etc.) using `file://` URLs: ```bash # Enable file access (required for JavaScript to access local files) agent-browser --allow-file-access open file:///path/to/document.pdf agent-browser --allow-file-access open file:///path/to/page.html # Take screenshot of a local PDF agent-browser --allow-file-access open file:///Users/me/report.pdf agent-browser screenshot report.png ``` The `--allow-file-access` flag adds Chromium flags (`--allow-file-access-from-files`, `--allow-file-access`) that allow `file://` URLs to: - Load and render local files - Access other local files via JavaScript (XHR, fetch) - Load local resources (images, scripts, stylesheets) **Note:** This flag only works with Chromium. For security, it's disabled by default. ## CDP Mode Connect to an existing browser via Chrome DevTools Protocol: ```bash # Start Chrome with: google-chrome --remote-debugging-port=9222 # Connect once, then run commands without --cdp agent-browser connect 9222 agent-browser snapshot agent-browser tab agent-browser close # Or pass --cdp on each command agent-browser --cdp 9222 snapshot # Connect to remote browser via WebSocket URL agent-browser --cdp "wss://your-browser-service.com/cdp?token=..." snapshot ``` The `--cdp` flag accepts either: - A port number (e.g., `9222`) for local connections via `http://localhost:{port}` - A full WebSocket URL (e.g., `wss://...` or `ws://...`) for remote browser services This enables control of: - Electron apps - Chrome/Chromium instances with remote debugging - WebView2 applications - Any browser exposing a CDP endpoint ### Auto-Connect Use `--auto-connect` to automatically discover and connect to a running Chrome instance without specifying a port: ```bash # Auto-discover running Chrome with remote debugging agent-browser --auto-connect open example.com agent-browser --auto-connect snapshot # Or via environment variable AGENT_BROWSER_AUTO_CONNECT=1 agent-browser snapshot ``` Auto-connect discovers Chrome by: 1. Reading Chrome's `DevToolsActivePort` file from the default user data directory 2. Falling back to probing common debugging ports (9222, 9229) 3. If HTTP-based discovery (`/json/version`, `/json/list`) fails, falling back to a direct WebSocket connection This is useful when: - Chrome 144+ has remote debugging enabled via `chrome://inspect/#remote-debugging` (which uses a dynamic port) - You want a zero-configuration connection to your existing browser - You don't want to track which port Chrome is using ## Streaming (Browser Preview) Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent. ### Streaming Every session automatically starts a WebSocket stream server on an OS-assigned port. Use `stream status` to see the bound port and connection state: ```bash agent-browser stream status ``` To bind to a specific port, set `AGENT_BROWSER_STREAM_PORT`: ```bash AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com ``` You can also manage streaming at runtime with `stream enable`, `stream disable`, and `stream status`: ```bash agent-browser stream enable --port 9223 # Re-enable on a specific port agent-browser stream disable # Stop streaming for the session ``` The WebSocket server streams the browser viewport and accepts input events. ### WebSocket Protocol Connect to `ws://localhost:9223` to receive frames and send input: **Receive frames:** ```json { "type": "frame", "data": "<base64-encoded-jpeg>", "metadata": { "deviceWidth": 1280, "deviceHeight": 720, "pageScaleFactor": 1, "offsetTop": 0, "scrollOffsetX": 0, "scrollOffsetY": 0 } } ``` **Send mouse events:** ```json { "type": "input_mouse", "eventType": "mousePressed", "x": 100, "y": 200, "button": "left", "clickCount": 1 } ``` **Send keyboard events:** ```json { "type": "input_keyboard", "eventType": "keyDown", "key": "Enter", "code": "Enter" } ``` **Send touch events:** ```json { "type": "input_touch", "eventType": "touchStart", "touchPoints": [{ "x": 100, "y": 200 }] } ``` ## Architecture agent-browser uses a client-daemon architecture: 1. **Rust CLI** - Parses commands, communicates with daemon 2. **Rust Daemon** - Pure Rust daemon using direct CDP, no Node.js required The daemon starts automatically on first command and persists between commands for fast subsequent operations. To auto-shutdown the daemon after a period of inactivity, set `AGENT_BROWSER_IDLE_TIMEOUT_MS` (value in milliseconds). When set, the daemon closes the browser and exits after receiving no commands for the specified duration. **Browser Engine:** Uses Chrome (from Chrome for Testing) by default. The `--engine` flag selects between `chrome` and `lightpanda`. Supported browsers: Chromium/Chrome (via CDP) and Safari (via WebDriver for iOS). ## Platforms | Platform | Binary | | ----------- | ----------- | | macOS ARM64 | Native Rust | | macOS x64 | Native Rust | | Linux ARM64 | Native Rust | | Linux x64 | Native Rust | | Windows x64 | Native Rust | ## Usage with AI Agents ### Just ask the agent The simplest approach -- just tell your agent to use it: ``` Use agent-browser to test the login flow. Run agent-browser --help to see available commands. ``` The `--help` output is comprehensive and most agents can figure it out from there. ### AI Coding Assistants (recommended) Add the skill to your AI coding assistant for richer context: ```bash npx skills add vercel-labs/agent-browser ``` This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf. The skill is fetched from the repository, so it stays up to date automatically -- do not copy `SKILL.md` from `node_modules` as it will become stale. ### Claude Code Install as a Claude Code skill: ```bash npx skills add vercel-labs/agent-browser ``` This adds a thin discovery stub at `.claude/skills/agent-browser/SKILL.md`. The stub is intentionally minimal — it points Claude Code at `agent-browser skills get core` to load the actual workflow content at runtime. This way the instructions always match the installed CLI version instead of going stale between releases. ### AGENTS.md / CLAUDE.md For more consistent results, add to your project or global instructions file: ```markdown ## Browser Automation Use `agent-browser` for web automation. Run `agent-browser --help` for all commands. Core workflow: 1. `agent-browser open <url>` - Navigate to page 2. `agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2) 3. `agent-browser click @e1` / `fill @e2 "text"` - Interact using refs 4. Re-snapshot after page changes ``` ## Integrations ### iOS Simulator Control real Mobile Safari in the iOS Simulator for authentic mobile web testing. Requires macOS with Xcode. **Setup:** ```bash # Install Appium and XCUITest driver npm install -g appium appium driver install xcuitest ``` **Usage:** ```bash # List available iOS simulators agent-browser device list # Launch Safari on a specific device agent-browser -p ios --device "iPhone 16 Pro" open https://example.com # Same commands as desktop agent-browser -p ios snapshot -i agent-browser -p ios tap @e1 agent-browser -p ios fill @e2 "text" agent-browser -p ios screenshot mobile.png # Mobile-specific commands agent-browser -p ios swipe up agent-browser -p ios swipe down 500 # Close session agent-browser -p ios close ``` Or use environment variables: ```bash export AGENT_BROWSER_PROVIDER=ios export AGENT_BROWSER_IOS_DEVICE="iPhone 16 Pro" agent-browser open https://example.com ``` | Variable | Description | | -------------------------- | ----------------------------------------------- | | `AGENT_BROWSER_PROVIDER` | Set to `ios` to enable iOS mode | | `AGENT_BROWSER_IOS_DEVICE` | Device name (e.g., "iPhone 16 Pro", "iPad Pro") | | `AGENT_BROWSER_IOS_UDID` | Device UDID (alternative to device name) | **Supported devices:** All iOS Simulators available in Xcode (iPhones, iPads), plus real iOS devices. **Note:** The iOS provider boots the simulator, starts Appium, and controls Safari. First launch takes ~30-60 seconds; subsequent commands are fast. #### Real Device Support Appium also supports real iOS devices connected via USB. This requires additional one-time setup: **1. Get your device UDID:** ```bash xcrun xctrace list devices # or system_profiler SPUSBDataType | grep -A 5 "iPhone\|iPad" ``` **2. Sign WebDriverAgent (one-time):** ```bash # Open the WebDriverAgent Xcode project cd ~/.appium/node_modules/appium-xcuitest-driver/node_modules/appium-webdriveragent open WebDriverAgent.xcodeproj ``` In Xcode: - Select the `WebDriverAgentRunner` target - Go to Signing & Capabilities - Select your Team (requires Apple Developer account, free tier works) - Let Xcode manage signing automatically **3. Use with agent-browser:** ```bash # Connect device via USB, then: agent-browser -p ios --device "<DEVICE_UDID>" open https://example.com # Or use the device name if unique agent-browser -p ios --device "John's iPhone" open https://example.com ``` **Real device notes:** - First run installs WebDriverAgent to the device (may require Trust prompt) - Device must be unlocked and connected via USB - Slightly slower initial connection than simulator - Tests against real Safari performance and behavior ### Browserless [Browserless](https://browserless.io) provides cloud browser infrastructure with a Sessions API. Use it when running agent-browser in environments where a local browser isn't available. To enable Browserless, use the `-p` flag: ```bash export BROWSERLESS_API_KEY="your-api-token" agent-browser -p browserless open https://example.com ``` Or use environment variables for CI/scripts: ```bash export AGENT_BROWSER_PROVIDER=browserless export BROWSERLESS_API_KEY="your-api-token" agent-browser open https://example.com ``` Optional configuration via environment variables: | Variable | Description | Default | | -------------------------- | ------------------------------------------------ | --------------------------------------- | | `BROWSERLESS_API_URL` | Base API URL (for custom regions or self-hosted) | `https://production-sfo.browserless.io` | | `BROWSERLESS_BROWSER_TYPE` | Type of browser to use (chromium or chrome) | chromium | | `BROWSERLESS_TTL` | Session TTL in milliseconds | `300000` | | `BROWSERLESS_STEALTH` | Enable stealth mode (`true`/`false`) | `true` | When enabled, agent-browser connects to a Browserless cloud session instead of launching a local browser. All commands work identically. Get your API token from the [Browserless Dashboard](https://browserless.io). ### Browserbase [Browserbase](https://browserbase.com) provides remote browser infrastructure to make deployment of agentic browsing agents easy. Use it when running the agent-browser CLI in an environment where a local browser isn't feasible. To enable Browserbase, use the `-p` flag: ```bash export BROWSERBASE_API_KEY="your-api-key" agent-browser -p browserbase open https://example.com ``` Or use environment variables for CI/scripts: ```bash export AGENT_BROWSER_PROVIDER=browserbase export BROWSERBASE_API_KEY="your-api-key" agent-browser open https://example.com ``` When enabled, agent-browser connects to a Browserbase session instead of launching a local browser. All commands work identically. Get your API key from the [Browserbase Dashboard](https://browserbase.com/overview). ### Browser Use [Browser Use](https://browser-use.com) provides cloud browser infrastructure for AI agents. Use it when running agent-browser in environments where a local browser isn't available (serverless, CI/CD, etc.). To enable Browser Use, use the `-p` flag: ```bash export BROWSER_USE_API_KEY="your-api-key" agent-browser -p browseruse open https://example.com ``` Or use environment variables for CI/scripts: ```bash export AGENT_BROWSER_PROVIDER=browseruse export BROWSER_USE_API_KEY="your-api-key" agent-browser open https://example.com ``` When enabled, agent-browser connects to a Browser Use cloud session instead of launching a local browser. All commands work identically. Get your API key from the [Browser Use Cloud Dashboard](https://cloud.browser-use.com/settings?tab=api-keys). Free credits are available to get started, with pay-as-you-go pricing after. ### Kernel [Kernel](https://www.kernel.sh) provides cloud browser infrastructure for AI agents with features like stealth mode and persistent profiles. To enable Kernel, use the `-p` flag: ```bash export KERNEL_API_KEY="your-api-key" agent-browser -p kernel open https://example.com ``` Or use environment variables for CI/scripts: ```bash export AGENT_BROWSER_PROVIDER=kernel export KERNEL_API_KEY="your-api-key" agent-browser open https://example.com ``` Optional configuration via environment variables: | Variable | Description | Default | | ------------------------ | -------------------------------------------------------------------------------- | ------- | | `KERNEL_HEADLESS` | Run browser in headless mode (`true`/`false`) | `false` | | `KERNEL_STEALTH` | Enable stealth mode to avoid bot detection (`true`/`false`) | `true` | | `KERNEL_TIMEOUT_SECONDS` | Session timeout in seconds | `300` | | `KERNEL_PROFILE_NAME` | Browser profile name for persistent cookies/logins (created if it doesn't exist) | (none) | When enabled, agent-browser connects to a Kernel cloud session instead of launching a local browser. All commands work identically. **Profile Persistence:** When `KERNEL_PROFILE_NAME` is set, the profile will be created if it doesn't already exist. Cookies, logins, and session data are automatically saved back to the profile when the browser session ends, making them available for future sessions. Get your API key from the [Kernel Dashboard](https://dashboard.onkernel.com). ### AgentCore [AWS Bedrock AgentCore](https://aws.amazon.com/bedrock/agentcore/) provides cloud browser sessions with SigV4 authentication. To enable AgentCore, use the `-p` flag: ```bash agent-browser -p agentcore open https://example.com ``` Or use environment variables for CI/scripts: ```bash export AGENT_BROWSER_PROVIDER=agentcore agent-browser open https://example.com ``` Credentials are automatically resolved from environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`) or the AWS CLI (`aws configure export-credentials`), which supports SSO, profiles, and IAM roles. Optional configuration via environment variables: | Variable | Description | Default | | -------------------------- | -------------------------------------------------------------------- | ---------------- | | `AGENTCORE_REGION` | AWS region for the AgentCore endpoint | `us-east-1` | | `AGENTCORE_BROWSER_ID` | Browser identifier | `aws.browser.v1` | | `AGENTCORE_PROFILE_ID` | Browser profile for persistent state (cookies, localStorage) | (none) | | `AGENTCORE_SESSION_TIMEOUT`| Session timeout in seconds | `3600` | | `AWS_PROFILE` | AWS CLI profile for credential resolution | `default` | **Browser profiles:** When `AGENTCORE_PROFILE_ID` is set, browser state (cookies, localStorage) is persisted across sessions automatically. When enabled, agent-browser connects to an AgentCore cloud browser session instead of launching a local browser. All commands work identically. ## License Apache-2.0
github-upstream
======================================================================== PRIVATE FORK WORKFLOW — Reactive-Resume (or any public repo) Goal: Keep YOUR code private, but still pull updates from the original ======================================================================== WHY THIS APPROACH ----------------- GitHub forks are always public if the source is public. To keep your copy private, you create a private repo and link it to the original manually using Git remotes. You will end up with: - origin -> YOUR private repo (where your code lives) - upstream -> the ORIGINAL repo (where you pull updates from) You only have ONE folder on your computer. "upstream" is just a bookmark, not a separate clone. ------------------------------------------------------------------------ STEP 1 — CREATE YOUR PRIVATE REPO ------------------------------------------------------------------------ 1. Go to: https://github.com/new 2. Name it (example: Reactive-Resume) 3. Set visibility: PRIVATE 4. Do NOT add README, .gitignore, or license (keep it empty) 5. Click "Create repository" ------------------------------------------------------------------------ STEP 2 — CLONE YOUR PRIVATE REPO ------------------------------------------------------------------------ git clone https://github.com/YOUR_USERNAME/Reactive-Resume.git cd Reactive-Resume ------------------------------------------------------------------------ STEP 3 — LINK THE ORIGINAL REPO AS "UPSTREAM" ------------------------------------------------------------------------ This adds a bookmark. Nothing is downloaded yet. git remote add upstream https://github.com/AmruthPillai/Reactive-Resume.git Verify both remotes exist: git remote -v Expected output: origin https://github.com/YOUR_USERNAME/Reactive-Resume.git (fetch) origin https://github.com/YOUR_USERNAME/Reactive-Resume.git (push) upstream https://github.com/AmruthPillai/Reactive-Resume.git (fetch) upstream https://github.com/AmruthPillai/Reactive-Resume.git (push) ------------------------------------------------------------------------ STEP 4 — PULL THE ORIGINAL CODE INTO YOUR REPO (ONE TIME) ------------------------------------------------------------------------ git fetch upstream git merge upstream/main --allow-unrelated-histories The --allow-unrelated-histories flag is needed only on the first merge because your private repo and the original have no shared history yet. ------------------------------------------------------------------------ STEP 4a — IF YOU SEE A README.md CONFLICT ------------------------------------------------------------------------ This means both repos have a README and Git doesn't know which to keep. To keep the ORIGINAL repo's README: git checkout --theirs README.md git add README.md git commit -m "Initial merge from upstream" To keep YOUR README instead, use --ours instead of --theirs. ------------------------------------------------------------------------ STEP 5 — PUSH EVERYTHING TO YOUR PRIVATE REPO ------------------------------------------------------------------------ git push origin main Your private repo now contains all the original code. ======================================================================== YOUR DAILY WORKFLOW (after setup) ======================================================================== WORKING ON YOUR OWN CODE ------------------------ git add . git commit -m "your message" git push --> goes to YOUR private repo git pull --> pulls from YOUR private repo Nothing touches the original repo. You won't sync by accident. PULLING UPDATES FROM THE ORIGINAL (when you want them) ----------------------------------------------------- git fetch upstream git merge upstream/main git push origin main If you and the original author changed the same files, Git will pause and ask you to resolve conflicts. Fix them, then: git add . git commit git push origin main ======================================================================== REMOTE CHEAT SHEET ======================================================================== origin = YOUR private repo - default for git push / git pull - holds all your custom changes upstream = the ORIGINAL repo - only used when you explicitly fetch / merge - read-only in practice (don't push to it) ======================================================================== TIPS ======================================================================== - Keep your customizations on a dedicated branch (e.g. "custom"). Merging upstream into main, then merging main into custom, keeps history cleaner than mixing everything on main. - Prefer syncing at tagged RELEASES instead of every commit: git fetch upstream --tags git merge v1.2.3 - Use rebase ONLY if you have not pushed yet: git rebase upstream/main Rebasing rewrites history. Don't rebase commits already on origin. - To see what changed upstream before merging: git fetch upstream git log HEAD..upstream/main --oneline - If a merge gets messy and you want to bail out: git merge --abort ========================================================================
obscura-command-reference-cookbook
# Obscura Command Reference & Cookbook > A practical, descriptive guide to the Obscura headless browser CLI, its CDP API, and recipes for the most common use cases: scraping, QA / UI test automation, browser automation, and network interception. **Document version:** 1.1 **Source of truth:** `https://github.com/h4ckf0r0day/obscura` (README), the binary's own `--help` output, and the Chrome DevTools Protocol specification. **Honest caveats up front** — read these before using this document: - Obscura is a very new project (initial release April 2026, single maintainer). The README's CDP methods table lists what's *advertised as implemented*. Some methods may behave differently from headless Chrome, and several whole CDP domains common in Chrome (Performance, Tracing, ServiceWorker, Audits, etc.) are not listed at all. **Probe before you depend.** - The README header says **Apache-2.0**; the "License" section at the bottom says **MIT**. The repository's `LICENSE` file is authoritative — check it before redistributing. - Benchmark numbers in the README (30 MB, 85 ms) are project-provided; treat them as marketing until you measure on your own targets. - Anywhere this document goes beyond what is in the README, it draws on the published Chrome DevTools Protocol — which is a public, stable spec. Those sections are clearly marked **[CDP, verify support]**. --- ## Table of Contents 1. [Installation & Verification](#1-installation--verification) 2. [The Three CLI Commands](#2-the-three-cli-commands) 3. [CLI Reference (Every Documented Flag)](#3-cli-reference-every-documented-flag) 4. [The CDP API — Documented Domains](#4-the-cdp-api--documented-domains) 5. [Driving Obscura Without Puppeteer (Raw CDP)](#5-driving-obscura-without-puppeteer-raw-cdp) 6. [Cookbook — Web Scraping](#6-cookbook--web-scraping) 7. [Cookbook — QA / UI Test Automation](#7-cookbook--qa--ui-test-automation) 8. [Cookbook — General Browser Automation](#8-cookbook--general-browser-automation) 9. [Cookbook — Network Interception & Capture](#9-cookbook--network-interception--capture) 10. [Stealth Mode in Practice](#10-stealth-mode-in-practice) 11. [Session, Process & Port Management](#11-session-process--port-management) 12. [Troubleshooting & Coverage Probing](#12-troubleshooting--coverage-probing) 13. [What Obscura is **not** good for](#13-what-obscura-is-not-good-for) --- ## 1. Installation & Verification ### macOS Apple Silicon (your case) ```bash curl -LO https://github.com/h4ckf0r0day/obscura/releases/latest/download/obscura-aarch64-macos.tar.gz tar xzf obscura-aarch64-macos.tar.gz cd obscura-aarch64-macos chmod +x obscura obscura-worker xattr -dr com.apple.quarantine . ./obscura --help ``` `obscura-worker` is a per-page subprocess that `obscura` spawns internally — **never invoke it directly**. ### Verifying the install ```bash # Should print version and usage ./obscura --help # Should fetch a page and return its title ./obscura fetch https://example.com --eval "document.title" # Should bring up a CDP server on port 9222 ./obscura serve --port 9222 # Then in another terminal: curl http://localhost:9222/json/version # Expect a JSON blob with "Browser": "Obscura/0.1.x" ``` If any of the three above fail, stop and triage before reading further. ### Build from source ```bash git clone https://github.com/h4ckf0r0day/obscura.git cd obscura cargo build --release # plain cargo build --release --features stealth # with stealth + tracker blocking ``` Requires Rust 1.75+. First build pulls and compiles V8 (~5 minutes); subsequent builds are cached. --- ## 2. The Three CLI Commands Obscura's CLI is intentionally small. There are exactly three top-level commands. Everything else is a flag on one of them. | Command | Purpose | When to use | |---|---|---| | `obscura fetch <URL>` | Single-shot page operation. Loads one page, optionally evaluates JS, dumps output, exits. | Quick checks, one-off scripts, smoke tests, CI assertions. | | `obscura serve` | Long-running CDP WebSocket server on a port. Other tools (Puppeteer / Playwright / raw WebSocket) drive it. | Real automation. Anything multi-step or interactive. The main mode. | | `obscura scrape <URL...>` | Parallel batch scraping with worker processes. One eval expression applied across many URLs. | Bulk extraction over a known URL list. | Mental model: **`fetch` is curl-with-JavaScript. `serve` is Chrome with `--remote-debugging-port`. `scrape` is xargs + fetch with concurrency.** --- ## 3. CLI Reference (Every Documented Flag) Authoritative source: the README's CLI Reference section. Always cross-check with `./obscura <command> --help`, which is generated from the binary itself. ### `obscura serve` — start a CDP WebSocket server | Flag | Default | What it does | |---|---|---| | `--port` | `9222` | TCP port the WebSocket listens on. Pick anything free. | | `--proxy` | — | Route all browser traffic through an HTTP or SOCKS5 proxy. Format: `http://host:port`, `http://user:pass@host:port`, `socks5://host:port`. | | `--stealth` | off | Enable anti-fingerprinting + tracker blocking. See [Stealth Mode](#10-stealth-mode-in-practice). | | `--workers` | `1` | Number of `obscura-worker` subprocesses to keep ready. Each handles one page. Increase for concurrent sessions. | | `--obey-robots` | off | Respect `robots.txt`. **Off by default.** Turn on for any scraping that should be polite. | **HTTP endpoints exposed by `serve`** (standard CDP-over-HTTP discovery): ``` GET /json/version # browser metadata, websocket URL GET /json/list # array of open pages with their per-page WS URLs GET /json/protocol # CDP protocol descriptor (may be partial in Obscura) PUT /json/new?<URL> # open a new page at URL GET /json/close/<targetId> # close a page ``` ### `obscura fetch <URL>` — one-shot page | Flag | Default | What it does | |---|---|---| | `--dump` | `html` | What to print to stdout after the page loads. `html` (full DOM serialization), `text` (visible text), `links` (extracted `<a href>` list). | | `--eval` | — | A JavaScript expression evaluated in the page after load. Result is printed. Mutually useful with `--dump`. | | `--wait-until` | `load` | When the page is considered "ready" before evaluating. `load` (window.load fires), `domcontentloaded` (DOM parsed), `networkidle0` (no in-flight requests for a quiet period). | | `--selector` | — | Wait until a CSS selector matches before continuing. Use for client-rendered content that appears asynchronously. | | `--stealth` | off | Per-fetch stealth mode. Same effect as on `serve`. | | `--quiet` | off | Suppress the startup banner / non-essential logs. Useful when piping output to another program. | **Not in the README but standard for tools like this — verify before relying:** - `--timeout <seconds>` — was shown in third-party reviews; confirm with `--help`. - `--user-agent <string>`, `--header <K:V>`, `--cookie <K=V>` — common, may or may not be implemented yet. ### `obscura scrape <URL...>` — parallel batch | Flag | Default | What it does | |---|---|---| | `--concurrency` | `10` | Number of parallel worker processes. Each handles one URL at a time. Tune to your CPU and target site's tolerance. | | `--eval` | — | JS expression run on each page. Result is collected per URL into the output. | | `--format` | `json` | Output shape. `json` produces an array of `{url, result, error}` objects; `text` produces line-delimited results. | URLs can be passed as args (`obscura scrape url1 url2 url3`) or piped (`cat urls.txt | xargs obscura scrape`). --- ## 4. The CDP API — Documented Domains Obscura advertises these CDP domains in the README. **Anything not listed here is not guaranteed to exist** — call it via raw CDP and check for `Method not found` errors before building on it. ### `Target` Manages "targets" (pages, browser contexts). | Method | Purpose | |---|---| | `Target.createTarget` | Open a new page/tab at a URL. Returns a `targetId`. | | `Target.closeTarget` | Close a page by `targetId`. | | `Target.attachToTarget` | Attach the current session to a target so you can issue page-scoped commands. | | `Target.createBrowserContext` | Create an isolated context (think incognito profile). Cookies/storage isolated. | | `Target.disposeBrowserContext` | Destroy a context and its state. | ### `Page` Page-level lifecycle and navigation. | Method | Purpose | |---|---| | `Page.navigate` | Go to a URL. Most commonly used method. | | `Page.getFrameTree` | Inspect the iframe hierarchy of the current page. | | `Page.addScriptToEvaluateOnNewDocument` | Inject JS that runs **before any page script** on every new navigation. Used for fingerprint patching, polyfills, mocks. | | `Page.lifecycleEvents` | Toggle lifecycle event reporting (`load`, `DOMContentLoaded`, `networkIdle`, etc.). | **Note:** `Page.captureScreenshot` and `Page.printToPDF` are not in the README's table. They're standard CDP — try them, but don't assume they exist. ### `Runtime` Execute JavaScript in the page. | Method | Purpose | |---|---| | `Runtime.evaluate` | Run an expression and return the result. The workhorse for reading state. | | `Runtime.callFunctionOn` | Call a function with a specific `this` binding. Useful for invoking methods on already-resolved objects. | | `Runtime.getProperties` | Enumerate properties of a remote object. The mechanism behind devtools' object inspector. | | `Runtime.addBinding` | Expose a host function the page can call (`window.myBinding(arg)` → fires an event in your driver). Powerful for two-way communication. | ### `DOM` Direct DOM access without going through `Runtime.evaluate`. | Method | Purpose | |---|---| | `DOM.getDocument` | Get the root document node and its tree. | | `DOM.querySelector` | Find one node by CSS selector, returns a `nodeId`. | | `DOM.querySelectorAll` | Find all nodes matching a selector. | | `DOM.getOuterHTML` | Serialize a node back to HTML. | | `DOM.resolveNode` | Convert a `nodeId` into a JavaScript handle for `Runtime.callFunctionOn`. | ### `Network` HTTP-level visibility and configuration. | Method | Purpose | |---|---| | `Network.enable` | **Required** before any other `Network` method does anything. | | `Network.setCookies` | Set one or more cookies in bulk. | | `Network.getCookies` | Read cookies (filtered by URL or all). | | `Network.setExtraHTTPHeaders` | Add headers to every outgoing request. | | `Network.setUserAgentOverride` | Override the User-Agent string. | **Events emitted (when `Network.enable` is on, per CDP spec — verify):** - `Network.requestWillBeSent` - `Network.responseReceived` - `Network.loadingFinished` / `Network.loadingFailed` ### `Fetch` Live request interception. More powerful than `Network` for modifying or blocking traffic. | Method | Purpose | |---|---| | `Fetch.enable` | Start intercepting. You declare which patterns and stages (request / response) to pause on. | | `Fetch.continueRequest` | Let a paused request through, optionally with a modified URL/method/headers/body. | | `Fetch.fulfillRequest` | Respond to a paused request **without** ever sending it (mock responses). | | `Fetch.failRequest` | Abort a paused request (block trackers, simulate offline). | ### `Storage` | Method | Purpose | |---|---| | `Storage.getCookies` | Read cookies. | | `Storage.setCookies` | Write cookies. | | `Storage.deleteCookies` | Delete cookies. | Effectively duplicates the `Network` cookie methods. Use whichever your client library prefers. ### `Input` Synthetic user input. | Method | Purpose | |---|---| | `Input.dispatchMouseEvent` | Synthesize a mouse event (move, down, up, click) at coordinates. | | `Input.dispatchKeyEvent` | Synthesize keyboard events. | **Notable absence**: `Input.dispatchTouchEvent`. Mobile-style touch automation may not work. ### `LP` (Obscura-specific) | Method | Purpose | |---|---| | `LP.getMarkdown` | Convert the current rendered DOM to Markdown. Useful for piping page content to LLMs without writing your own HTML→MD converter. **Not standard CDP** — only Obscura. | --- ## 5. Driving Obscura Without Puppeteer (Raw CDP) Useful when you want to understand what's happening, debug an issue, or build a custom client. ### Step 1 — discover the WebSocket URL ```bash curl -s http://localhost:9222/json/version | jq .webSocketDebuggerUrl # → "ws://127.0.0.1:9222/devtools/browser" curl -s http://localhost:9222/json/list | jq '.[0].webSocketDebuggerUrl' # → "ws://127.0.0.1:9222/devtools/page/page-1" ``` The browser-level URL accepts target-management commands. The page-level URL accepts page-scoped commands directly. ### Step 2 — talk to it with `wscat` ```bash npm install -g wscat wscat -c ws://127.0.0.1:9222/devtools/page/page-1 ``` Then paste CDP JSON, one message per line: ```json {"id":1,"method":"Page.navigate","params":{"url":"https://example.com"}} {"id":2,"method":"Runtime.evaluate","params":{"expression":"document.title"}} {"id":3,"method":"Runtime.evaluate","params":{"expression":"document.querySelectorAll('a').length"}} ``` Every response has either a `result` (success) or `error` (e.g. `{"code":-32601,"message":"'Foo.bar' wasn't found"}` — meaning the method isn't implemented). ### Step 3 — wrap it from any language The protocol is JSON-over-WebSocket. Any language with a WebSocket client can drive Obscura. Below is a minimal Python example: ```python import json, asyncio, websockets async def run(): async with websockets.connect("ws://127.0.0.1:9222/devtools/page/page-1") as ws: await ws.send(json.dumps({"id": 1, "method": "Page.navigate", "params": {"url": "https://example.com"}})) print(await ws.recv()) await ws.send(json.dumps({"id": 2, "method": "Runtime.evaluate", "params": {"expression": "document.title"}})) print(await ws.recv()) asyncio.run(run()) ``` --- ## 6. Cookbook — Web Scraping ### Recipe 6.1 — Quick one-shot scrape (CLI only) When you just want a value out of one page: ```bash ./obscura fetch https://news.ycombinator.com \ --wait-until networkidle0 \ --eval "Array.from(document.querySelectorAll('.titleline > a')).map(a => ({title: a.textContent, url: a.href})).slice(0, 30)" ``` ### Recipe 6.2 — Bulk scrape with `scrape` ```bash ./obscura scrape \ https://example.com/page/1 \ https://example.com/page/2 \ https://example.com/page/3 \ --concurrency 10 \ --eval "({title: document.title, h1: document.querySelector('h1')?.textContent})" \ --format json > out.json ``` For larger lists, pipe from a file: ```bash xargs -a urls.txt ./obscura scrape --concurrency 25 --eval "document.title" --format json > titles.json ``` ### Recipe 6.3 — Pagination with Puppeteer When the URL list isn't known upfront and you have to discover next-page links: ```javascript import puppeteer from 'puppeteer-core'; const browser = await puppeteer.connect({ browserWSEndpoint: 'ws://127.0.0.1:9222/devtools/browser', }); const page = await browser.newPage(); const results = []; let url = 'https://example.com/list?page=1'; while (url) { await page.goto(url, { waitUntil: 'networkidle0' }); const rows = await page.$$eval('.row', els => els.map(el => ({ title: el.querySelector('.title')?.textContent.trim(), price: el.querySelector('.price')?.textContent.trim(), })) ); results.push(...rows); const next = await page.$eval('a.next', a => a.href).catch(() => null); url = next; } console.log(JSON.stringify(results, null, 2)); await browser.disconnect(); ``` ### Recipe 6.4 — Login, then scrape behind auth ```javascript const page = await browser.newPage(); await page.goto('https://quotes.toscrape.com/login'); await page.evaluate(() => { document.querySelector('#username').value = 'admin'; document.querySelector('#password').value = 'admin'; document.querySelector('form').submit(); }); await page.waitForNavigation({ waitUntil: 'networkidle0' }); // Now authenticated; cookies persist across navigations on this page await page.goto('https://quotes.toscrape.com/'); const quotes = await page.$$eval('.quote', els => els.map(el => el.querySelector('.text').textContent) ); console.log(quotes); ``` ### Recipe 6.5 — DOM → Markdown for LLM pipelines Obscura-specific feature, useful for feeding pages to language models without writing your own converter. Raw CDP: ```json {"id":1,"method":"LP.getMarkdown"} ``` The response's `result.markdown` field contains the page rendered as Markdown. ### Recipe 6.6 — Avoid hammering the server ```bash # Sequential, with a small delay for url in $(cat urls.txt); do ./obscura fetch "$url" --eval "document.title" --quiet sleep 1 done # Parallel but capped — respects target throughput ./obscura scrape $(cat urls.txt) --concurrency 5 ``` For polite scraping, **also** add `--obey-robots` to `serve` and check the target's `robots.txt`. --- ## 7. Cookbook — QA / UI Test Automation > **Read this first:** Obscura is a *web* headless browser. It does not test native iOS / Android / macOS apps. For Qt mobile regression (or any native UI) use Squish, Appium, or XCTest. Obscura is for **web admin panels, web dashboards, and end-to-end browser tests of web apps**. ### Recipe 7.1 — Smoke-test a deployed web app from a shell script ```bash #!/bin/bash set -e URL="https://staging.example.com" # 1. Reachable? title=$(./obscura fetch "$URL" --eval "document.title" --quiet) [[ -n "$title" ]] || { echo "FAIL: empty title"; exit 1; } # 2. No JS errors at load? errors=$(./obscura fetch "$URL" --eval "window.__errors || 0" --quiet) [[ "$errors" == "0" ]] || { echo "FAIL: $errors JS errors"; exit 1; } # 3. Critical element present? ok=$(./obscura fetch "$URL" --eval "!!document.querySelector('#login-button')" --quiet) [[ "$ok" == "true" ]] || { echo "FAIL: login button missing"; exit 1; } echo "PASS" ``` This is enough for a CI gate that catches "is the site even rendering." ### Recipe 7.2 — End-to-end login flow (Playwright) ```javascript import { chromium } from 'playwright-core'; import { test, expect } from '@playwright/test'; test('user can log in', async () => { const browser = await chromium.connectOverCDP({ endpointURL: 'ws://127.0.0.1:9222', }); const ctx = await browser.newContext(); const page = await ctx.newPage(); await page.goto('https://app.example.com/login'); await page.fill('#email', 'qa@example.com'); await page.fill('#password', 'test1234'); await page.click('button[type="submit"]'); await expect(page).toHaveURL(/\/dashboard/); await expect(page.locator('h1')).toHaveText('Welcome back'); await browser.close(); }); ``` Run with `npx playwright test` against an Obscura serve instance. ### Recipe 7.3 — Visual regression (screenshot diff) **[CDP, verify support]** `Page.captureScreenshot` is standard CDP but not in Obscura's documented method list. Probe it before betting on this recipe. ```javascript const page = await browser.newPage(); await page.goto('https://app.example.com/dashboard'); await page.waitForSelector('[data-testid="loaded"]'); const buf = await page.screenshot({ fullPage: true }); fs.writeFileSync('dashboard-current.png', buf); // Compare with a baseline using pixelmatch / odiff / your tool of choice ``` ### Recipe 7.4 — Console-error and uncaught-error catchers Catches client-side regressions a happy-path click test would miss. ```javascript const errors = []; page.on('pageerror', err => errors.push({ type: 'uncaught', message: err.message })); page.on('console', msg => { if (msg.type() === 'error') errors.push({ type: 'console', message: msg.text() }); }); await page.goto('https://app.example.com'); await page.waitForLoadState('networkidle'); if (errors.length) { console.error(errors); process.exit(1); } ``` ### Recipe 7.5 — Accessibility smoke checks with axe-core ```javascript await page.goto('https://app.example.com'); await page.addScriptTag({ url: 'https://cdn.jsdelivr.net/npm/axe-core@4/axe.min.js' }); const results = await page.evaluate(async () => await axe.run()); const violations = results.violations.filter(v => ['critical', 'serious'].includes(v.impact)); if (violations.length) { console.log(JSON.stringify(violations, null, 2)); process.exit(1); } ``` ### Recipe 7.6 — Deterministic dates in tests ```javascript // Freeze Date before any page script runs await page.evaluateOnNewDocument(() => { const RealDate = Date; // @ts-ignore globalThis.Date = class extends RealDate { constructor(...args) { if (args.length === 0) return new RealDate('2026-01-01T00:00:00Z'); // @ts-ignore return new RealDate(...args); } }; globalThis.Date.now = () => new RealDate('2026-01-01T00:00:00Z').getTime(); }); await page.goto('https://app.example.com'); ``` --- ## 8. Cookbook — General Browser Automation ### Recipe 8.1 — Multiple isolated sessions in one Obscura instance Use browser contexts for parallel users / accounts that must not share cookies. ```javascript const browser = await puppeteer.connect({ browserWSEndpoint: 'ws://127.0.0.1:9222/devtools/browser', }); // Each context is an isolated cookie/storage jar const userA = await browser.createIncognitoBrowserContext(); const userB = await browser.createIncognitoBrowserContext(); const pageA = await userA.newPage(); const pageB = await userB.newPage(); await Promise.all([ pageA.goto('https://app.example.com'), pageB.goto('https://app.example.com'), ]); ``` In Obscura terms: this maps to `Target.createBrowserContext` and `Target.disposeBrowserContext`. ### Recipe 8.2 — Programmatic cookies before navigating ```javascript const cookies = [ { name: 'session', value: 'abc123', domain: 'example.com', path: '/' }, { name: 'theme', value: 'dark', domain: 'example.com', path: '/' }, ]; await page.setCookie(...cookies); await page.goto('https://example.com/dashboard'); ``` Raw CDP equivalent: ```json {"id":1,"method":"Network.setCookies","params":{"cookies":[{"name":"session","value":"abc123","domain":"example.com","path":"/"}]}} ``` ### Recipe 8.3 — File downloads **[CDP, verify support]** Download handling typically uses `Browser.setDownloadBehavior` or `Page.setDownloadBehavior`, neither of which is in Obscura's documented list. Probe first. A workaround for binary downloads: do the request inside the page and read the bytes back: ```javascript const data = await page.evaluate(async (url) => { const r = await fetch(url); const b = await r.arrayBuffer(); return Array.from(new Uint8Array(b)); }, 'https://example.com/file.pdf'); fs.writeFileSync('file.pdf', Buffer.from(data)); ``` ### Recipe 8.4 — Click, type, scroll Driven through the high-level Puppeteer / Playwright API (which under the hood uses `Input.dispatchMouseEvent` and `Input.dispatchKeyEvent`): ```javascript await page.click('button#start'); await page.type('input[name="search"]', 'hello world'); await page.keyboard.press('Enter'); await page.evaluate(() => window.scrollBy(0, 1000)); ``` ### Recipe 8.5 — Wait strategies ```javascript await page.goto(url, { waitUntil: 'load' }); // default await page.goto(url, { waitUntil: 'domcontentloaded' }); await page.goto(url, { waitUntil: 'networkidle0' }); // 0 in-flight reqs for 500ms await page.waitForSelector('[data-testid="loaded"]'); await page.waitForFunction(() => window.__appReady === true); await page.waitForResponse(r => r.url().endsWith('/api/me') && r.status() === 200); ``` ### Recipe 8.6 — Inject script that runs before page scripts ```javascript await page.evaluateOnNewDocument(() => { // Patches applied before any site code Object.defineProperty(navigator, 'language', { get: () => 'fr-FR' }); }); await page.goto('https://example.com'); ``` Maps to `Page.addScriptToEvaluateOnNewDocument`. --- ## 9. Cookbook — Network Interception & Capture ### Recipe 9.1 — Log every request (HAR-lite) ```javascript const log = []; page.on('request', req => log.push({ phase: 'req', url: req.url(), method: req.method(), at: Date.now() })); page.on('response', res => log.push({ phase: 'res', url: res.url(), status: res.status(), at: Date.now() })); await page.goto('https://example.com', { waitUntil: 'networkidle0' }); fs.writeFileSync('network.json', JSON.stringify(log, null, 2)); ``` ### Recipe 9.2 — Block trackers / ads / images ```javascript await page.setRequestInterception(true); page.on('request', req => { const blockedTypes = ['image', 'media', 'font']; const blockedHosts = ['google-analytics.com', 'doubleclick.net']; if (blockedTypes.includes(req.resourceType()) || blockedHosts.some(h => req.url().includes(h))) { req.abort(); } else { req.continue(); } }); ``` Raw CDP: `Fetch.enable` with a pattern, then `Fetch.failRequest` for blocked URLs and `Fetch.continueRequest` for the rest. ### Recipe 9.3 — Mock an API response Useful for QA: test how the UI behaves when the backend returns specific data without modifying the backend. ```javascript await page.setRequestInterception(true); page.on('request', req => { if (req.url().endsWith('/api/me')) { req.respond({ status: 200, contentType: 'application/json', body: JSON.stringify({ id: 1, name: 'Test User', plan: 'pro' }), }); } else { req.continue(); } }); ``` Maps to `Fetch.fulfillRequest`. ### Recipe 9.4 — Modify outgoing headers / inject auth tokens ```javascript await page.setExtraHTTPHeaders({ 'Authorization': 'Bearer ' + process.env.TOKEN, 'X-Test-Run': 'ci-' + process.env.CI_BUILD, }); await page.goto('https://app.example.com'); ``` Maps to `Network.setExtraHTTPHeaders`. ### Recipe 9.5 — Capture API response bodies ```javascript page.on('response', async res => { if (res.url().includes('/api/') && res.headers()['content-type']?.includes('json')) { try { const body = await res.json(); console.log(res.url(), body); } catch (_) { /* not all responses are JSON */ } } }); await page.goto('https://app.example.com'); ``` ### Recipe 9.6 — Force offline / slow network **[Verify support]** Network throttling typically uses `Network.emulateNetworkConditions`, not in the README's list. ```javascript // Standard CDP — try, fall back if not supported const client = await page.target().createCDPSession(); await client.send('Network.emulateNetworkConditions', { offline: false, latency: 400, downloadThroughput: 50 * 1024, // 50 KB/s uploadThroughput: 20 * 1024, }); ``` If `Method not found`, simulate at the OS level instead (e.g., `tc` on Linux, Network Link Conditioner on macOS). --- ## 10. Stealth Mode in Practice `--stealth` is intended for scraping sites that gate by bot signals. Per the README it does: - **Anti-fingerprinting:** randomizes GPU, screen, canvas, audio, battery per session; spoofs `navigator.userAgentData` to look like Chrome 145; `navigator.webdriver = undefined`; masks native function strings; hides internal properties from `Object.keys(window)`; sets `event.isTrusted = true` for synthetic events. - **Tracker blocking:** 3,520-domain blocklist for analytics, ads, telemetry, fingerprinting scripts. Loaded automatically with `--stealth`. ### When to use it - Scraping public sites that aggressively bot-detect. - AI agent runs where you want to look like a normal browser session. - Reducing noise in network captures (trackers off = cleaner HAR). ### When **not** to use it - Internal dashboards / your own apps. Stealth introduces variability — your app may behave differently because canvas fingerprints differ each session. - QA test runs where determinism matters more than blending in. - Anything where bypassing bot detection violates the target's Terms of Service. Just because you *can* doesn't mean you should. ### Verifying stealth is on ```javascript await page.goto('https://bot.sannysoft.com/'); await page.screenshot({ path: 'stealth-check.png', fullPage: true }); ``` A red row in the report = a signal the page can use to detect automation. --- ## 11. Session, Process & Port Management This section covers what most CLI guides skip: how to **stop** what you started. You will hit all of these within your first hour of use. ### 11.1 — Mental model: three things you might want to kill Obscura runs as a small process tree. To clean up properly, know what level you're acting on: | Level | What it is | How to stop it | |---|---|---| | **Page / target** | A single tab inside a running `obscura serve`. State: cookies, DOM, JS context. | `Target.closeTarget` via CDP, or `page.close()` in Puppeteer/Playwright. The `serve` process keeps running. | | **Browser context** | An isolated cookie/storage jar containing one or more pages (think incognito profile). | `Target.disposeBrowserContext` via CDP, or `context.close()` in Puppeteer/Playwright. | | **Server process** | The `obscura` binary itself, listening on the port. Owns all contexts and pages, plus `obscura-worker` subprocesses. | OS-level: Ctrl-C if foregrounded; `kill <pid>` otherwise. | If you only close pages and contexts, the server keeps running and the port stays bound. If you only kill the server, any client (Puppeteer script) still pointing at it will throw a connection error on its next call. ### 11.2 — Closing pages and contexts (in-session) **From Puppeteer:** ```javascript await page.close(); // close one page await context.close(); // close a context and all its pages await browser.disconnect(); // detach client; server keeps running ``` **From Playwright:** ```javascript await page.close(); await context.close(); await browser.close(); // closes all contexts opened by this connection ``` **From raw CDP (via `wscat`):** ```json {"id":1,"method":"Target.closeTarget","params":{"targetId":"page-1"}} {"id":2,"method":"Target.disposeBrowserContext","params":{"browserContextId":"<id>"}} ``` Get the `targetId` from `GET /json/list` or from the response of `Target.createTarget`. Get the `browserContextId` from the response of `Target.createBrowserContext`. **From the HTTP discovery endpoints** (handy for shell scripts): ```bash # Close a specific page curl http://localhost:9222/json/close/page-1 # Close every page (loop) curl -s http://localhost:9222/json/list \ | jq -r '.[].id' \ | xargs -I{} curl -s http://localhost:9222/json/close/{} ``` After this, the server process is still running but has no open pages. ### 11.3 — Stopping the `obscura serve` process **If you started it in the foreground:** Just press **Ctrl-C** in that terminal. Obscura should clean up its `obscura-worker` children on SIGINT. **If it's running in the background or you've lost the terminal:** Find the PID: ```bash # By process name pgrep -f "obscura serve" # Or with more detail ps aux | grep -E "obscura( serve|-worker)" | grep -v grep ``` Kill it gracefully: ```bash pkill -INT -f "obscura serve" # SIGINT — same as Ctrl-C ``` Force-kill if it doesn't exit (give it 2–3 seconds first): ```bash pkill -KILL -f "obscura serve" pkill -KILL -f "obscura-worker" # in case workers are orphaned ``` **Verify it's gone:** ```bash pgrep -f obscura || echo "no obscura processes running" ``` > **Note on graceful shutdown:** Obscura's README does not document a CDP-level `Browser.close` for shutting down the server. Standard CDP defines `Browser.close`, so it may work — try it before resorting to `kill`: > ```json > {"id":1,"method":"Browser.close"} > ``` > If you get `Method not found`, fall back to OS signals. ### 11.4 — Freeing the port If something is still listening on `9222` after you thought you killed everything: **Find what's holding the port (macOS / Linux):** ```bash lsof -nP -iTCP:9222 -sTCP:LISTEN ``` Output shows PID and process name. Then: ```bash kill -9 <PID> ``` **One-liner that does both:** ```bash lsof -tiTCP:9222 -sTCP:LISTEN | xargs -r kill -9 ``` **Confirm the port is free:** ```bash lsof -nP -iTCP:9222 -sTCP:LISTEN || echo "port free" ``` This matters because if you start a second `obscura serve --port 9222` while the first is still bound, the second will fail with `Address already in use` and you'll spend ten minutes wondering why. ### 11.5 — Orphaned `obscura-worker` processes `obscura-worker` is a subprocess `obscura` spawns per page. In normal shutdown, the parent reaps them. In abnormal shutdown (`kill -9`, crash, OOM-kill) workers can get orphaned and keep using memory. **Detect:** ```bash pgrep -fa obscura-worker ``` **Clean up:** ```bash pkill -KILL -f obscura-worker ``` If you find this happening repeatedly without an obvious crash, it's worth filing an issue on the repo — it's a real bug. ### 11.6 — Run-and-clean-up patterns **Pattern: trap on script exit (bash)** ```bash #!/bin/bash ./obscura serve --port 9222 --stealth & SERVER_PID=$! trap 'kill -INT $SERVER_PID 2>/dev/null; wait $SERVER_PID 2>/dev/null' EXIT INT TERM # ... your work here ... node my-scraper.js # Server is killed automatically when this script exits, even on error ``` **Pattern: spawn from Node and tear down on exit** ```javascript import { spawn } from 'node:child_process'; const obscura = spawn('./obscura', ['serve', '--port', '9222'], { stdio: 'inherit', }); const cleanup = () => { obscura.kill('SIGINT'); }; process.on('exit', cleanup); process.on('SIGINT', () => { cleanup(); process.exit(); }); process.on('SIGTERM', () => { cleanup(); process.exit(); }); // ... wait for server-up, run automation, etc. ``` **Pattern: ephemeral port to avoid conflicts in CI** ```bash # Pick a free port, store it, use it, kill it PORT=$(python3 -c 'import socket; s=socket.socket(); s.bind(("",0)); print(s.getsockname()[1]); s.close()') ./obscura serve --port "$PORT" & SERVER_PID=$! trap 'kill -INT $SERVER_PID 2>/dev/null' EXIT echo "Obscura on port $PORT (pid $SERVER_PID)" ``` ### 11.7 — Sanity checklist after a "stuck" session When something feels off and you want a clean slate: ```bash # 1. Kill all Obscura processes pkill -INT -f obscura ; sleep 2 ; pkill -KILL -f obscura # 2. Free the port lsof -tiTCP:9222 -sTCP:LISTEN | xargs -r kill -9 # 3. Confirm pgrep -f obscura || echo "no obscura processes" lsof -nP -iTCP:9222 -sTCP:LISTEN || echo "port free" # 4. Restart fresh ./obscura serve --port 9222 --stealth ``` Bookmark this. You will run it more often than you'd like. --- ## 12. Troubleshooting & Coverage Probing ### "Is method X supported?" — 60-second probe ```bash wscat -c ws://127.0.0.1:9222/devtools/page/page-1 ``` Then for each method you care about: ```json {"id":1,"method":"Page.captureScreenshot"} {"id":2,"method":"Network.emulateNetworkConditions","params":{"offline":false,"latency":0,"downloadThroughput":-1,"uploadThroughput":-1}} {"id":3,"method":"Browser.setDownloadBehavior","params":{"behavior":"allow","downloadPath":"/tmp/dl"}} ``` A response with `"result"` = supported. A response with `"error":{"code":-32601}` = not implemented. ### Common warnings you can ignore - `Script error ... TypeError: Cannot read properties of undefined (reading 'getAttribute')` from minified site bundles. Means a polyfill probe failed in Obscura's V8 environment. The page usually still works. - Empty `devtoolsFrontendUrl` in `/json/list`. Obscura doesn't host the DevTools UI. You can still attach Chrome DevTools manually: ``` devtools://devtools/bundled/inspector.html?ws=127.0.0.1:9222/devtools/page/page-1 ``` ### Common real failures - **CDP method not found.** The README's documented surface is what's safe. For everything else, probe. - **Page hangs on `--wait-until networkidle0`.** Some sites keep long-poll connections open forever. Use `--wait-until load` + `--selector` instead. - **Stealth makes page render differently.** Disable stealth, retest. Some sites detect spoofed fingerprints and serve a different layout. - **Form submit doesn't follow redirect.** Check that the form actually went through; some SPAs don't use real form submits — emulate a click on the submit button instead of `form.submit()`. ### Logging everything Obscura sees ```bash RUST_LOG=debug ./obscura serve --port 9222 ``` Standard Rust `tracing` env var. Verbose, but the right tool when something silently doesn't work. --- ## 13. What Obscura is **not** good for Be honest about the gaps so you don't fight the tool: - **Native mobile / desktop UI testing.** Use Appium, Squish, XCTest, Espresso. - **Visual debugging with full Chrome DevTools.** Many panels (Performance, Memory, Coverage, Lighthouse) rely on CDP domains Obscura hasn't implemented. - **Tests that require real Chromium-only behavior** (extension APIs, WebUSB, WebSerial, FedCM, complex media DRM). - **Production-critical pipelines today.** It's a 2-day-old, 1-star, 1-maintainer project. Ship it on hobbyist or experimental workloads first; revisit for production once it has a track record. - **Sites that detect Rust V8 fingerprints.** Stealth helps but isn't a guarantee. - **Teams without anyone who can read Rust.** When something breaks, you'll need to read source. --- ## Appendix — Quick Reference Card ``` # Health check curl http://localhost:9222/json/version curl http://localhost:9222/json/list # Open a page curl -X PUT "http://localhost:9222/json/new?https://example.com" # CLI essentials obscura fetch <URL> --eval "<JS>" # one-shot eval obscura fetch <URL> --dump html|text|links # dump page content obscura fetch <URL> --wait-until networkidle0 # wait for SPA obscura fetch <URL> --selector ".loaded" # wait for element obscura serve --port 9222 --stealth # CDP server obscura scrape u1 u2 ... --concurrency N --eval "<JS>" --format json # Connect from Puppeteer puppeteer.connect({ browserWSEndpoint: 'ws://127.0.0.1:9222/devtools/browser' }) # Connect from Playwright chromium.connectOverCDP({ endpointURL: 'ws://127.0.0.1:9222' }) # Raw CDP wscat -c ws://127.0.0.1:9222/devtools/page/page-1 # Cleanup — close pages, kill server, free port curl http://localhost:9222/json/close/page-1 # close one page pkill -INT -f "obscura serve" # graceful stop pkill -KILL -f obscura # force stop lsof -tiTCP:9222 -sTCP:LISTEN | xargs -r kill -9 # free port pgrep -f obscura || echo "all clean" # confirm ``` --- *End of document. If a recipe doesn't work, the most likely cause is that Obscura hasn't implemented the underlying CDP method yet. Use the probe in Section 11 to confirm before assuming it's a bug.*
knowledge-webapp-testing
--- title: Webapp Testing description: Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs. tags: [skill, testing, playwright, browser] --- # Web Application Testing To test local web applications, write native Python Playwright scripts. **Helper Scripts Available**: - `scripts/with_server.py` - Manages server lifecycle (supports multiple servers) **Always run scripts with `--help` first** to see usage. DO NOT read the source until you try running the script first and find that a customized solution is abslutely necessary. These scripts can be very large and thus pollute your context window. They exist to be called directly as black-box scripts rather than ingested into your context window. ## Decision Tree: Choosing Your Approach ``` User task → Is it static HTML? ├─ Yes → Read HTML file directly to identify selectors │ ├─ Success → Write Playwright script using selectors │ └─ Fails/Incomplete → Treat as dynamic (below) │ └─ No (dynamic webapp) → Is the server already running? ├─ No → Run: python scripts/with_server.py --help │ Then use the helper + write simplified Playwright script │ └─ Yes → Reconnaissance-then-action: 1. Navigate and wait for networkidle 2. Take screenshot or inspect DOM 3. Identify selectors from rendered state 4. Execute actions with discovered selectors ``` ## Example: Using with_server.py To start a server, run `--help` first, then use the helper: **Single server:** ```bash python scripts/with_server.py --server "npm run dev" --port 5173 -- python your_automation.py ``` **Multiple servers (e.g., backend + frontend):** ```bash python scripts/with_server.py \ --server "cd backend && python server.py" --port 3000 \ --server "cd frontend && npm run dev" --port 5173 \ -- python your_automation.py ``` To create an automation script, include only Playwright logic (servers are managed automatically): ```python from playwright.sync_api import sync_playwright with sync_playwright() as p: browser = p.chromium.launch(headless=True) # Always launch chromium in headless mode page = browser.new_page() page.goto('http://localhost:5173') # Server already running and ready page.wait_for_load_state('networkidle') # CRITICAL: Wait for JS to execute # ... your automation logic browser.close() ``` ## Reconnaissance-Then-Action Pattern 1. **Inspect rendered DOM**: ```python page.screenshot(path='/tmp/inspect.png', full_page=True) content = page.content() page.locator('button').all() ``` 2. **Identify selectors** from inspection results 3. **Execute actions** using discovered selectors ## Common Pitfall **Don't** inspect the DOM before waiting for `networkidle` on dynamic apps **Do** wait for `page.wait_for_load_state('networkidle')` before inspection ## Best Practices - **Use bundled scripts as black boxes** - To accomplish a task, consider whether one of the scripts available in `scripts/` can help. These scripts handle common, complex workflows reliably without cluttering the context window. Use `--help` to see usage, then invoke directly. - Use `sync_playwright()` for synchronous scripts - Always close the browser when done - Use descriptive selectors: `text=`, `role=`, CSS selectors, or IDs - Add appropriate waits: `page.wait_for_selector()` or `page.wait_for_timeout()` ## Helper Script: with_server.py Starts one or more servers, waits for them to be ready, runs a command, then cleans up. ```python #!/usr/bin/env python3 import subprocess import socket import time import sys import argparse def is_server_ready(port, timeout=30): """Wait for server to be ready by polling the port.""" start_time = time.time() while time.time() - start_time < timeout: try: with socket.create_connection(('localhost', port), timeout=1): return True except (socket.error, ConnectionRefusedError): time.sleep(0.5) return False def main(): parser = argparse.ArgumentParser(description='Run command with one or more servers') parser.add_argument('--server', action='append', dest='servers', required=True, help='Server command (can be repeated)') parser.add_argument('--port', action='append', dest='ports', type=int, required=True, help='Port for each server (must match --server count)') parser.add_argument('--timeout', type=int, default=30, help='Timeout in seconds per server (default: 30)') parser.add_argument('command', nargs=argparse.REMAINDER, help='Command to run after server(s) ready') args = parser.parse_args() # Remove the '--' separator if present if args.command and args.command[0] == '--': args.command = args.command[1:] if not args.command: print("Error: No command specified to run") sys.exit(1) # Parse server configurations if len(args.servers) != len(args.ports): print("Error: Number of --server and --port arguments must match") sys.exit(1) servers = [] for cmd, port in zip(args.servers, args.ports): servers.append({'cmd': cmd, 'port': port}) server_processes = [] try: # Start all servers for i, server in enumerate(servers): print(f"Starting server {i+1}/{len(servers)}: {server['cmd']}") # Use shell=True to support commands with cd and && process = subprocess.Popen( server['cmd'], shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE ) server_processes.append(process) # Wait for this server to be ready print(f"Waiting for server on port {server['port']}...") if not is_server_ready(server['port'], timeout=args.timeout): raise RuntimeError(f"Server failed to start on port {server['port']} within {args.timeout}s") print(f"Server ready on port {server['port']}") print(f"\nAll {len(servers)} server(s) ready") # Run the command print(f"Running: {' '.join(args.command)}\n") result = subprocess.run(args.command) sys.exit(result.returncode) finally: # Clean up all servers print(f"\nStopping {len(server_processes)} server(s)...") for i, process in enumerate(server_processes): try: process.terminate() process.wait(timeout=5) except subprocess.TimeoutExpired: process.kill() process.wait() print(f"Server {i+1} stopped") print("All servers stopped") if __name__ == '__main__': main() ``` ## Example: Element Discovery Discovering buttons, links, and inputs on a page: ```python from playwright.sync_api import sync_playwright with sync_playwright() as p: browser = p.chromium.launch(headless=True) page = browser.new_page() # Navigate to page and wait for it to fully load page.goto('http://localhost:5173') page.wait_for_load_state('networkidle') # Discover all buttons on the page buttons = page.locator('button').all() print(f"Found {len(buttons)} buttons:") for i, button in enumerate(buttons): text = button.inner_text() if button.is_visible() else "[hidden]" print(f" [{i}] {text}") # Discover links links = page.locator('a[href]').all() print(f"\nFound {len(links)} links:") for link in links[:5]: # Show first 5 text = link.inner_text().strip() href = link.get_attribute('href') print(f" - {text} -> {href}") # Discover input fields inputs = page.locator('input, textarea, select').all() print(f"\nFound {len(inputs)} input fields:") for input_elem in inputs: name = input_elem.get_attribute('name') or input_elem.get_attribute('id') or "[unnamed]" input_type = input_elem.get_attribute('type') or 'text' print(f" - {name} ({input_type})") # Take screenshot for visual reference page.screenshot(path='/tmp/page_discovery.png', full_page=True) print("\nScreenshot saved to /tmp/page_discovery.png") browser.close() ``` ## Example: Static HTML Automation Using file:// URLs for local HTML files: ```python from playwright.sync_api import sync_playwright import os html_file_path = os.path.abspath('path/to/your/file.html') file_url = f'file://{html_file_path}' with sync_playwright() as p: browser = p.chromium.launch(headless=True) page = browser.new_page(viewport={'width': 1920, 'height': 1080}) # Navigate to local HTML file page.goto(file_url) # Take screenshot page.screenshot(path='/tmp/static_page.png', full_page=True) # Interact with elements page.click('text=Click Me') page.fill('#name', 'John Doe') page.fill('#email', 'john@example.com') # Submit form page.click('button[type="submit"]') page.wait_for_timeout(500) # Take final screenshot page.screenshot(path='/tmp/after_submit.png', full_page=True) browser.close() ``` ## Example: Console Log Capture Capturing console logs during browser automation: ```python from playwright.sync_api import sync_playwright url = 'http://localhost:5173' # Replace with your URL console_logs = [] with sync_playwright() as p: browser = p.chromium.launch(headless=True) page = browser.new_page(viewport={'width': 1920, 'height': 1080}) # Set up console log capture def handle_console_message(msg): console_logs.append(f"[{msg.type}] {msg.text}") print(f"Console: [{msg.type}] {msg.text}") page.on("console", handle_console_message) # Navigate to page page.goto(url) page.wait_for_load_state('networkidle') # Interact with the page (triggers console logs) page.click('text=Dashboard') page.wait_for_timeout(1000) browser.close() # Save console logs to file with open('/tmp/console.log', 'w') as f: f.write('\n'.join(console_logs)) print(f"\nCaptured {len(console_logs)} console messages") ```
knowledge-vision
--- title: Vision description: Accept images alongside text for description, classification, and visual Q&A. tags: [ollama, api, vision, multimodal] --- Vision models accept images alongside text so the model can describe, classify, and answer questions about what it sees. ## Quick start ```shell ollama run gemma3 ./image.png whats in this image? ``` ## Usage with Ollama's API Provide an `images` array. SDKs accept file paths, URLs or raw bytes while the REST API expects base64-encoded image data. ### cURL ```shell # 1. Download a sample image curl -L -o test.jpg "https://upload.wikimedia.org/wikipedia/commons/3/3a/Cat03.jpg" # 2. Encode the image IMG=$(base64 < test.jpg | tr -d '\n') # 3. Send it to Ollama curl -X POST http://localhost:11434/api/chat \ -H "Content-Type: application/json" \ -d '{ "model": "gemma3", "messages": [{ "role": "user", "content": "What is in this image?", "images": ["'"$IMG"'"] }], "stream": false }' ``` ### Python ```python from ollama import chat # from pathlib import Path # Pass in the path to the image path = input('Please enter the path to the image: ') # You can also pass in base64 encoded image data # img = base64.b64encode(Path(path).read_bytes()).decode() # or the raw bytes # img = Path(path).read_bytes() response = chat( model='gemma3', messages=[ { 'role': 'user', 'content': 'What is in this image? Be concise.', 'images': [path], } ], ) print(response.message.content) ``` ### JavaScript ```javascript import ollama from 'ollama' const imagePath = '/absolute/path/to/image.jpg' const response = await ollama.chat({ model: 'gemma3', messages: [ { role: 'user', content: 'What is in this image?', images: [imagePath] } ], stream: false, }) console.log(response.message.content) ```