The Voyage 4 model family: shared embedding space with MoE architecture

TL;DR – We’re excited to introduce the Voyage 4 series, a new generation of text embedding models featuring industry-first shared embedding spaces. The series includes voyage-4-large, voyage-4, voyage-4-lite, and the open-weighted voyage-4-nano. All models produce compatible embeddings, allowing customers to mix and match models for query and document embedding based on their specific accuracy, latency, and cost requirements. Furthermore, voyage-4-large leverages a mixture-of-experts (MoE) model architecture to deliver state-of-the-art retrieval accuracy while maintaining serving costs 40% lower than comparable dense models.

Today, we’re excited to announce the Voyage 4 model family. These models serve two key use cases: existing customers seeking more accurate retrieval, and developers building context-engineered agents that require high retrieval accuracy with low latency and cost for high-volume reads (e.g., from shared memory):

  • voyage-4-large. Our new flagship embedding model leveraging a mixture-of-experts (MoE) architecture to establish a new state-of-the-art while maintaining serving costs 40% lower than comparable dense models. This is the first production-grade embedding model to utilize MoE architecture.
  • voyage-4. Approaches the retrieval quality of voyage-3-large while maintaining the efficiency of a mid-sized model.
  • voyage-4-lite. Approaches the retrieval accuracy of voyage-3.5 while requiring significantly fewer parameters, enabling high-quality embeddings at significantly reduced computational cost.
  • voyage-4-nano. Our first open-weight model, freely available on Hugging Face under the Apache 2.0 license. voyage-4-nano is ideal for local development and prototyping with an easy path to production.

A single shared embedding space. The Voyage 4 series introduces an industry-first capability: shared embedding spaces. All four models produce compatible embeddings, meaning embeddings generated from different models can be used interchangeably. For example, query embeddings generated using voyage-4-lite can be used to search for document embeddings generated using voyage-4-large; we refer to the practice of vectorizing queries and documents with different models as asymmetric retrieval.

Asymmetric retrieval is most effective when the upfront cost of vectorizing your document corpus is small relative to the cumulative cost of vectorizing queries over time. This is typically the case in production systems: documents are embedded once (or infrequently updated), while queries are embedded continuously at serving time. By vectorizing documents with voyage-4-large and queries with a smaller model such as voyage-4-nano, voyage-4-lite, or voyage-4, you get the retrieval accuracy benefits of the larger model’s document representations while keeping per-query latency and cost low:

For users with high query traffic, we recommend the following approach:

  • Vectorize your document corpus once with voyage-4-large for maximum retrieval accuracy.
  • Start with voyage-4-lite for query embeddings during development and early production – this minimizes serving costs while still benefiting from voyage-4-large document embeddings.
  • Upgrade to voyage-4 or voyage-4-large for query embeddings as your accuracy requirements evolve, without re-vectorizing documents.

This flexibility allows you to tune query and document embeddings independently: optimize document embeddings for accuracy (a one-time or infrequent cost) and query embeddings for latency (and ongoing serving cost).

Mixture-of-experts architecture. voyage-4-large is the first production-grade embedding model that utilizes a mixture-of-experts architecture. This enables frontier-level retrieval accuracy with serving costs 40% lower than comparable dense models – a new accuracy-cost frontier that allows voyage-4-large to attain better accuracy than voyage-3-large at a lower price.

Matryoshka learning and quantization. All models in the Voyage 4 series support 2048, 1024, 512, and 256 dimensional embeddings enabled by Matryoshka learning (“MRL”) and multiple embedding quantization options – including 32-bit floating point, signed and unsigned 8-bit integer, and binary precision – while minimizing quality loss. Combining Matryoshka embeddings and quantization can significantly reduce downstream vector database costs while still maintaining a high level of retrieval accuracy. For more information on MRL and quantization, check out our voyage-code-3 blog.

Evaluation Details

Datasets. We evaluate general-purpose retrieval quality using all 29 datasets in the comprehensive Retrieval Embedding Benchmark (RTEB). We also evaluate the asymmetric retrieval capabilities of the models on a set of datasets spanning eight domains: medical, code, web, finance, technical documentation, long documents, conversations, and law. Each dataset consists of a corpus (e.g., technical documentation, court opinions) and queries (e.g., questions, summaries).

Models. We evaluate voyage-4-large, voyage-4, and voyage-4-lite alongside Gemini Embedding 001, Cohere Embed v4, and OpenAI v3 Large.

Metrics. Given a query, we retrieve the top 10 documents based on cosine similarities and report the normalized discounted cumulative gain, a standard metric for retrieval quality and a variant of recall.

Results

General-purpose retrieval. The bar chart below compares the average retrieval quality of the Voyage 4 series of models along with Gemini Embedding 001, Cohere Embed v4, and OpenAI v3 Large. Overall, voyage-4-large is the top-performing model, surpassing voyage-4, voyage-4-lite, Gemini Embedding 001, Cohere Embed v4, and OpenAI v3 Large by an average of 1.87%, 4.80%, 3.87%, 8.20%, and 14.05%, respectively.

Asymmetric retrieval. The bar charts below show asymmetric retrieval quality (denoted in the figure below with a *), i.e. the accuracy when a smaller model – voyage-4-nano, voyage-4-lite, and voyage-4 – is used to retrieve documents embedded with voyage-4-large. Retrieval quality is improved across the board when using asymmetric embeddings. For reference, we provide retrieval quality results using voyage-3.5-lite and non-asymmetric retrieval as well.

All the evaluation results are available in this spreadsheet.

Try the Voyage 4 series today!

voyage-4-large, voyage-4, and voyage-4-lite are available today via the Voyage API, and are also available to MongoDB Atlas customers through the Atlas Embedding and Reranking API. The first 200 million tokens are free. To get started, visit our docs to learn more. 

voyage-4-nano is available on Hugging Face for local development, and we welcome contributions.

Follow us on X (Twitter) and LinkedIn to stay up-to-date with our latest releases.

Tags:

Leave a Reply

Discover more from Voyage AI

Subscribe now to keep reading and get access to the full archive.

Continue reading