Powered by Gemini 3.1 Pro · Set-of-Mark Vision

The web agent that sees, thinks,
and clicks
for you

Opticlick Engine is a Chrome Extension that autonomously navigates any website. Describe your goal in plain English — it handles every click.

Add to Chrome — it's free View on GitHub
opticlick-engine · agent log
00:00.124[THINK]Goal received: "Book the cheapest flight from NYC to London next week"
00:00.381[ACT]Annotating 47 interactable elements, piercing 3 Shadow DOM roots
00:00.692[OBSERVE]Screenshot captured (2560×1600). Sending to Gemini 3.1 Pro…
00:01.204[THINK]Model identified target #12 — "Search flights" button (confidence 0.97)
00:01.209[ACT]CDP Input.dispatchMouseEvent → {x: 724, y: 432} scaled for devicePixelRatio 2×
00:01.480[OBSERVE]DOM idle confirmed. 12 results rendered. Proceeding to Step 2…

Think. Act. Observe.
Repeat until done.

Opticlick runs a continuous perception–action loop, powered by multimodal AI, until your goal is complete.

01
🗣️

You describe the goal

Type a plain-English instruction in the extension popup. No selectors, no scripting required.

02
🎯

Page is annotated

A canvas overlay numbers every clickable element on the page, piercing Shadow DOM and iframes.

03
🧠

AI picks the target

A screenshot is sent to Gemini 3.1 Pro with your prompt. The model returns the ID of the element to interact with.

04
🖱️

Hardware-level click

The Chrome DevTools Protocol fires a real mouse event sequence — bypassing React, Vue, and Angular synthetic event guards.

05
🔁

Loop until complete

After each action the agent observes the new page state and decides the next move, autonomously.

Built for the modern web

Designed to handle dynamic SPAs, cross-origin iframes, and high-DPI displays out of the box.

👁️

Set-of-Mark Vision

Numbered bounding boxes rendered on a unified canvas give the LLM a precise, unambiguous spatial map of every interactable element.

🌐

Cross-origin iframe support

Content scripts are injected into all frames including sandboxed third-party iframes, so embedded widgets are never out of reach.

🔮

Shadow DOM traversal

Recursively pierces open Shadow DOM roots to discover components hidden inside Web Components and design-system libraries.

CDP hardware simulation

Uses Chrome Debugger API to dispatch real mouseMoved → mousePressed → mouseReleased events, indistinguishable from physical input.

📐

Retina-safe coordinates

Click coordinates are automatically divided by devicePixelRatio before dispatch, ensuring pixel-perfect accuracy on any display.

💾

Persistent state across restarts

MV3 service workers are ephemeral — Opticlick uses chrome.storage.session and IndexedDB so the agent never loses its place.

🛡️

Input blocking

Capturing event listeners prevent accidental user interference while the agent is mid-task, with a clear visual indicator when active.

DOM idle detection

A MutationObserver-based idle gate ensures annotations are only drawn once network and DOM activity have settled.

🔒

Minimal permissions

Requests only activeTab, scripting, debugger, and storage. No broad host permissions. Your API key stays local.

Engineered on open standards

No third-party automation libraries. Everything runs directly on the browser's own extension and debugging APIs.

Gemini 3.1 Pro Chrome Extensions MV3 Chrome DevTools Protocol Set-of-Mark Prompting HTML5 Canvas IndexedDB chrome.storage.session Shadow DOM API MutationObserver TypeScript WXT Framework React 19 Tailwind CSS v4

Ready to automate
your browsing?

Add Opticlick Engine to Chrome and run your first autonomous task in under two minutes.

Add to Chrome — it's free