Gemma 4 · Transformers.js v4.0.1 — Teaching Demo v32

This page runs a large language model entirely in your browser. Uses Gemma 4 E2B-IT (ONNX) via Transformers.js v4 and the CDN versions are at @huggingface/transformers

⚙ Generation Settings

System Prompt:

Max output tokens:
Higher = longer responses. Lower = cut off sooner.

Temperature:
0.1 = most deterministic, 2.0 = most creative

📖 What is a token?

A token is ~¾ of a word. Gemma 4 E2B context: 128,000 tokens.

Step 1 — Adjust generation settings above if desired.
Step 2 — Choose model & device below, then click Load Model.
First download is ~900 MB (q4). After that it is cached and reloads in seconds.

Model ID:
For a larger model try 'E4B', needs a powerful computer

Device:

Quantization (dtype):

🧠 Memory (LocalStorage)

Conversation:

Save to memory: ▶ Prompts (your messages) ◀ Replies (Gemma's messages) Unchecked = excluded from saved memory & context retrieval

⬆ Upload JSON

By Jeremy Ellis Github Profile · LinkedIn jeremy-ellis-4237a9bb · This Github my-examples-of-gemma4
Use at your own risk