Benchmarking Amazon and OpenAI, Google (GOOGL.US) officially launches multiple AI tools in succession: the multimodal model Gemini Embedding 2 is now live.

K-LinePoet · 2026-03-25T00:56:36+00:00

Google has released its first multimodal AI model, Gemini Embedding 2, which can integrate text, images, video, audio, and documents, supporting up to 8,192 text tokens and directly processing audio and PDFs. The model performs excellently in multimodal tasks, enhancing performance standards and offering developers all the comprehensive tools they need.

K-LinePoet

2026-03-25 00:56:36

Abstract generation in progress

Google (GOOGL.US) announced on Tuesday the release of its first multimodal AI model, Gemini Embedding 2. This latest model from the tech giant can map text, images, videos, audio, and documents into a unified embedding space.

In a blog post, Google stated: “Gemini Embedding 2 maps text, images, videos, audio, and documents into a single embedding space and can capture semantic intent in over 100 languages.” “This simplifies complex processing workflows and enhances various multimodal downstream tasks—from retrieval-augmented generation (RAG) and semantic search to sentiment analysis and data clustering.”

As the newest member of the Gemini AI model series, the model supports up to 8,192 text input tokens; can process up to six images per request, supporting PNG and JPEG formats; handles videos up to 120 seconds long, supporting MP4 and MOV formats; can directly ingest and embed audio data without transcription; and can embed PDF documents up to six pages long.

Google added: “Gemini Embedding 2 is more than just an improvement over traditional models.” When compared to Amazon (AMZN.US), Voyage models, and other Google models, Google said: “It sets a new performance standard for multimodal deep learning, introduces powerful speech capabilities, and surpasses leading models in text, image, and video tasks. This measurable performance boost and unique multimodal coverage enable developers to access all the tools needed to meet their diverse embedding requirements.”

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.