Chrome Built-In Multimodal image and sound input

Github for these gitpages at https://github.com/hpssjellis/my-examples-of-web-llm
Main Demo index at https://hpssjellis.github.io/my-examples-of-web-llm/public/index.html
Help to setup Chrome flags if needed

This page demonstrates the core features of the Gemini Nano Prompt API (`LanguageModel` API) available in Chrome 138+. Ensure you have enabled the necessary flags. Copy the link below and paste it into your Chrome address bar:
Then set #optimization-guide-on-device-model (to Enabled BypassPrefRequirement)
Then search for and set all of them to enabled:

  1. #prompt-api-for-gemini-nano (set to Enabled)
  2. #prompt-api-for-gemini-nano-multimodal-input(set to Enabled)
  3. #summarization-api-for-gemini-nano(set to Enabled)
  4. #writer-api-for-gemini-nano(set to Enabled)
  5. #rewriter-api-for-gemini-nano(set to Enabled)
  6. #proofreader-api-for-gemini-nano(set to Enabled)
The Gemini Nano model will download the first time you use it. That will be about 4.0 GB of download and will need about 20 GB saving space for the final folders.

Note: User Ctrl-Shift-i to show comments

Select an image or audio file, or use your webcam/mic to capture media, to describe it using the Language Model API.

Image Input

Audio Input


Status here

Description:




Chrome AI Translator

Source: โ†’ Target:


Status: