Select an image or audio file, or use your webcam/mic to capture media, to describe it using the Language Model API.