TL;DR – Voyage AI’s latest general-purpose text embedding model, voyage-large-2-instruct, now tops the overall MTEB leaderboard, outperforming OpenAI v3 large and Cohere English v3 on key tasks, such as retrieval, classification, clustering, and, reranking.
The Massive Text Embedding Benchmark (MTEB) hosted by HuggingFace is the de facto community benchmark for measuring the quality of text embedding models. As world-class experts and providers of embedding models, we have submitted several models over the past year. Our recently released legal embedding model, voyage-law-2, tops the retrieval leaderboard for law. Our voyage-lite-02-instruct, which was overall ranked #3 previously, had a 6x smaller number of parameters and 4x smaller embedding dimensions than other models in the top five.
Now, we are thrilled to announce that our latest general-purpose text embedding model voyage-large-2-instruct ranks #1 in the overall MTEB leaderboard. With a 16K context window, voyage-large-2-instruct incorporates instruction tuning and all our learnings from developing our other second-generation models. As shown in the following table, a simplified version of the overall MTEB leaderboard, voyage-large-2-instruct outperforms all other competing commercial models in five of the seven benchmarked tasks (e.g., retrieval, classification, clustering, reranking).
voyage-large-2-instruct | Google Gecko | OpenAI v3 Large | Cohere English v3 | |
|---|---|---|---|---|
| Embedding Dimension | 1024 | 768 | 3072 | 1024 |
| MTEB Rank | 1 | 7 | 14 | 16 |
| Classification | 81.48 | 81.17 | 75.45 | 76.49 |
| Clustering | 53.35 | 47.48 | 49.01 | 47.43 |
| Pair Classification | 89.24 | 87.61 | 85.72 | 85.84 |
| Reranking | 60.08 | 58.90 | 59.16 | 58.01 |
| Retrieval | 58.28 | 55.70 | 55.44 | 55.00 |
| STS | 84.58 | 85.07 | 81.73 | 82.62 |
| Summarization | 30.84 | 32.63 | 29.92 | 30.18 |
| Average | 68.28 | 66.31 | 64.59 | 64.47 |
Voyage AI provides a portfolio of cutting-edge models to tackle your use case. While we recommend voyage-large-2-instruct as the default for general-purpose embedding, if your application is in a domain addressed by one of our domain-specific embedding models, we recommend using that model (e.g., voyage-law-2, voyage-code-2). Stay tuned for more domain-specific embedding models from us. If you want to tune for the best possible retrieval quality with your data and use cases, voyage-large-2 also performs well for retrieval tasks and should be evaluated alongside voyage-large-2-instruct. Finally, voyage-2 is recommended for other generalist tasks with higher throughput demands; the model was optimized for a balance between cost, latency, and retrieval quality.
voyage-large-2-instruct, as the name suggests, is trained to be responsive to additional instructions that are prepended to the input text. For all retrieval/search tasks (e.g., in RAG), we make this convenient with the input_type parameter, which specifies whether the input text is a query or document (see here for details). For classification, clustering, or other MTEB subtasks, please use the instructions here.
Voyage AI trains best-in-class embedding models and continually tops the MTEB leaderboard, which validates our innovation and leadership position in embedding models. Head over to our docs to learn more and give our models a try. And, if you’re interested in early access to upcoming models or finetuning embeddings, we’d love to hear from you and please email [email protected]. Follow us on X (Twitter) and/or LinkedIn for more updates!
Leave a Reply