A fully local chrome extension running pre-embedded Gemma 4 models. Forked from https://github.com/kessler/gemma-gem

TypeScript 45.6%
JavaScript 33.5%
Jinja 20.8%
HTML 0.1%

Find a file

pablophg d540c920d5 Initial clean import with LFS		2026-04-24 02:44:44 +02:00
.claude	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
background	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
content	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
entrypoints	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
models	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
offscreen	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
public	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
scripts	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
shared	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
.gitattributes	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
.gitignore	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
CHANGELOG.md	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
CLAUDE.md	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
LICENSE	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
package.json	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
pnpm-lock.yaml	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
README.md	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
screenshot.png	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
screenshot2.jpg	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
tsconfig.json	Initial clean import with LFS	2026-04-24 02:44:44 +02:00
wxt.config.ts	Initial clean import with LFS	2026-04-24 02:44:44 +02:00

README.md

Chrome Clippy

Your personal AI assistant living right inside the browser. Chrome Clippy runs Google's Gemma 4 model entirely on-device via WebGPU — no API keys, no cloud, no data leaving your machine. It can read pages, click buttons, fill forms, run JavaScript, and answer questions about any site you visit.

Requirements

Chrome with WebGPU support
Local model files under models/
A local HTTP server for those model files

Setup

Install dependencies:

pnpm install

Place the required q4f16 model files into:

models/gemma-4-e2b
models/gemma-4-e4b

Mapped upstream repos:

models/gemma-4-e2b ← onnx-community/gemma-4-E2B-it-ONNX
models/gemma-4-e4b ← onnx-community/gemma-4-E4B-it-ONNX

Start the local model server:

pnpm serve:models

Build the extension:

pnpm build

The runtime now fetches models from http://127.0.0.1:8765/ using the same URL shape as the original Hugging Face downloads, so Chrome can cache them through the normal browser path.

Load the extension in chrome://extensions (developer mode) from .output/chrome-mv3-dev/.

Usage

Navigate to any page
Click the Chrome Clippy icon (bottom-right corner) to open the chat
Wait for the model to initialize (progress shown on icon + chat)
Ask questions about the page or request actions

Architecture

Offscreen Document          Service Worker           Content Script
(Gemma 4 + Agent Loop)  <-> (Message Router)    <-> (Chat UI + DOM Tools)
       |                         |
  WebGPU inference          Screenshot capture
  Token streaming           JS execution

Offscreen document: Hosts the model via @huggingface/transformers + WebGPU. Runs the agent loop.
Service worker: Routes messages between content scripts and offscreen document. Handles take_screenshot and run_javascript.
Content script: Injects the Chrome Clippy icon + shadow DOM chat overlay. Executes DOM tools (read_page_content, click_element, type_text, scroll_page).

Tools

Tool	Description	Runs in
`read_page_content`	Read text/HTML of the page or a CSS selector	Content script
`take_screenshot`	Capture visible page as PNG	Service worker
`click_element`	Click an element by CSS selector	Content script
`type_text`	Type into an input by CSS selector	Content script
`scroll_page`	Scroll up/down by pixel amount	Content script
`run_javascript`	Execute JS in the page context with full DOM access	Service worker

Settings

Click the gear icon in the chat header:

Model: Switch between Gemma 4 E2B and E4B served from the local model server. Selection persists across sessions.
Thinking: Toggle native Gemma 4 thinking
Max iterations: Cap on tool call loops per request
Clear context: Reset conversation history for the current page
Disable on this site: Disable the extension per-hostname (persisted)

Development

pnpm build              # Development build (with logging, source maps)
pnpm build:prod         # Production build (logging silenced, minified)

Tech Stack

WXT — Chrome extension framework (Vite-based)
@huggingface/transformers — Browser ML inference
marked — Markdown rendering in chat
Gemma 4 E2B / E4B (onnx-community/gemma-4-E2B-it-ONNX, onnx-community/gemma-4-E4B-it-ONNX) — q4f16 quantization, 128K context

Localhost Model Server

The extension points transformers.js at http://127.0.0.1:8765/ and keeps the original {model}/resolve/{revision}/... URL structure.
pnpm serve:models serves the required q4f16 files from models/gemma-4-e2b and models/gemma-4-e4b with CORS and range request support.
Chrome can then cache those responses through the normal browser cache path used by the original extension behavior.

Debugging

All logs are prefixed with [Chrome Clippy]. In development builds, info/debug/warn logs are active. Production builds only log errors.

Service worker logs: chrome://extensions → Chrome Clippy → "Inspect views: service worker"
Offscreen document logs: chrome://extensions → Chrome Clippy → "Inspect views: offscreen.html"
Content script logs: Open DevTools on any page → Console
All extension pages: chrome://inspect#other lists all inspectable extension contexts (service worker, offscreen document, etc.)

The offscreen document logs are the most useful — they show model loading, prompt construction, token counts, raw model output, and tool execution.

Notes

The agent/ directory has zero dependencies. It defines interfaces (ModelBackend, ToolExecutor) and can be extracted to a standalone library.