gemini-embedding-001

Model Description

The Gemini Embedding model achieves a SOTA performance across many key dimensions including code, multi-lingual, and retrieval.

Gemini Embedding is a state-of-the-art model that leverages the Gemini architecture to produce highly generalizable embeddings for text and code across numerous languages, designed for tasks like retrieval, classification, and clustering.

An Introduction to the Gemini Embedding Model

Gemini Embedding is a state-of-the-art embedding model designed to leverage the capabilities of Google’s Gemini large language model. It produces highly generalizable, dense vector representations for text spanning over 100 languages and various textual modalities, including code. These embeddings can be precomputed and applied to a wide range of downstream tasks such as classification, semantic similarity, clustering, ranking, and information retrieval.

Model Architecture

The model’s architecture is designed to create holistic representations of inputs. The process begins by initializing the embedding model from a pre-existing Gemini model, which allows it to build upon the vast knowledge already contained within Gemini’s parameters.

The technical process involves three main steps:

  1. An input text sequence is processed by a transformer with bidirectional attention, which produces a sequence of token-level embeddings.
  2. A mean pooling strategy is then applied. This involves averaging the token embeddings along the sequence axis to generate a single, fixed-size embedding that represents the entire input.
  3. Finally, a randomly initialized linear projection layer scales this pooled embedding to the desired final output dimension.

Training

The Gemini Embedding model was refined using a training objective based on a noise-contrastive estimation (NCE) loss function with in-batch negatives.

Performance and Capabilities

When evaluated on the Massive Multilingual Text Embedding Benchmark (MMTEB), which includes over one hundred tasks across more than 250 languages, Gemini Embedding has been shown to substantially outperform previous state-of-the-art models. It established a new state-of-the-art on the public leaderboard, achieving a mean score of 68.32, a significant improvement over the next-best model.

The model demonstrates exceptional performance not only in high-resource languages like English but also in numerous low-resource languages, such as Macedonian. It has also set new records on specific benchmarks like XOR-Retrieve for cross-lingual retrieval. This unified model shows strong capabilities across a broad selection of tasks, surpassing even specialized, domain-specific models in English, multilingual, and code benchmarks.

🔔How to Use

graph LR A("Purchase Now") --> B["Start Chat on Homepage"] A --> D["Read API Documentation"] B --> C["Register / Login"] C --> E["Enter Key"] D --> F["Enter Endpoint & Key"] E --> G("Start Using") F --> G style A fill:#f9f9f9,stroke:#333,stroke-width:1px style B fill:#f9f9f9,stroke:#333,stroke-width:1px style C fill:#f9f9f9,stroke:#333,stroke-width:1px style D fill:#f9f9f9,stroke:#333,stroke-width:1px style E fill:#f9f9f9,stroke:#333,stroke-width:1px style F fill:#f9f9f9,stroke:#333,stroke-width:1px style G fill:#f9f9f9,stroke:#333,stroke-width:1px

Purchase Now

Start Chat on Homepage

Register / Login

Enter Key

Read API Documentation

Enter Endpoint & Key

Start Using

Description Ends

Recommend Models

DeepGemini-2.5-pro

DeepSeek-R1 + gemini-2.5-pro-preview-03-25,The Deep series is composed of the DeepSeek-R1 (671b) model combined with the chain-of-thought reasoning of other models, fully utilizing the powerful capabilities of the DeepSeek chain-of-thought. It employs a strategy of leveraging other more powerful models for supplementation, thereby enhancing the overall model's capabilities.

claude-opus-4-20250514-thinking

Comprehensive introduction to Anthropic's newly released Claude 4 models, Opus 4 and Sonnet 4, highlighting their features, performance benchmarks, application scenarios, pricing, and availability. This report summarizes key differences between the models and discusses their integration with major platforms such as GitHub Copilot, emphasizing their advantages in coding, advanced reasoning, and ethical AI responses.

QwQ-32B

QwQ-32B is a 32.5B-parameter reasoning model in the Qwen series, featuring advanced architecture and 131K-token context length, designed to outperform state-of-the-art models like DeepSeek-R1 in complex tasks.