Boosting Your Search and RAG with Voyage’s Rerankers

TL;DR – Rerankers are neural nets that enhance the quality of search results in applications such as Retrieval-Augmented Generation (RAG). They score the relevance of initial, coarse-grained search outcomes and re-rank them based on the scores.
We’re thrilled to introduce Voyage’s inaugural general-purpose reranker, rerank-lite-1. Our comprehensive evaluation, encompassing 27 datasets across diverse topics—ranging from technical documentation and code to law, finance, web reviews, long documents, medicine, and conversations—shows that Voyage rerank-lite-1 is the state-of-the-art reranker, consistently outperforming bge-rerank-large and Cohere’s rerank-english-v2.0.

Retrieval-augmented generation (RAG) is the predominant approach for enterprise generative AI, where relevant proprietary knowledge is retrieved to enhance the capability of LLMs. The retrieval quality — the relevancy of the retrieved documents — significantly impacts the quality of the final responses. Voyage AI now provides a reranker API endpoint that can seamlessly integrate into your RAG stack and turbocharge your retrieval quality and end-to-end response quality.

How does a reranker work?

A reranker is often used as a refinement step in a two-stage retrieval system. In the first stage, embedding-based methods (or lexical search algorithms such as BM25 and TF-IDF) produce a broad set of initial search results. Following this, a re-ranker evaluates the relevance scores between the query and the candidate documents and selects the most pertinent subset of candidate documents based on the scores. Figure 1 illustrates how it works.

The blessing of cross-encoding. Technically, rerankers are “cross-encoder’’ neural networks that excel by processing query-document pairs together, capturing their nuanced and complex interactions. As shown in the figure below, on the right, the query and document are concatenated and input. This transformer includes multiple attention layers that span both query and document, facilitating a comprehensive understanding of their relationship.

In contrast, with the embedding-based approach (also referred to as bi-encoders), transformers generate embeddings separately for queries and documents, with interactions limited to comparing the cosine similarity of their embeddings. Consequently, rerankers typically produce more precise relevance scores than embedding-based methods, due to their detailed analysis of query-document interactions.

*Figure 2: Bi-Encoder (embedding model) vs Cross-Encoder (reranker)*

Tradeoffs in a hierarchical search. The two-stage retrieval system is designed to leverage the tradeoffs between the embedding-based search and rerankers. Rerankers can offer superior quality through cross-encoding; but the computational cost scales linearly in the number of candidate documents. In contrast, the embeddings for the corpus can be pre-computed, and vector-based search scales logarithmically in the corpus size.

Therefore, rerankers are most effective for refining results when dealing with a relatively small set of coarse-grained candidate documents (e.g., 100 or fewer), ensuring that the process remains time-efficient.

Modularity. A reranker is compatible with any first-stage search method, whether vector-based or lexical. This versatility facilitates its seamless integration as an additional step to boost any existing retrieval system.

Introducing Voyage rerank-lite-1

We are excited to introduce rerank-lite-1, the first general-purpose Voyage reranker:

State-of-the-Art Performance: Achieves unparalleled results on 27 datasets, outshining competitors such as bge-reranker-large and Cohere’s rerank-english-v2.0 in areas including technical documentation and code to law, finance, web reviews, long documents, medicine, and conversations.
Extended Context Length: Offers a 4K-context length, nearly 8x greater than Cohere’s rerank-english-v2.0, accommodating more complex documents and queries.
Flexible Pricing: Priced based on tokens rather than searches, providing cost savings in use cases with fewer and shorter documents.

Like our other models, Voyage rerank-lite-1 is offered as an API endpoint and as an Amazon Marketplace Model Package. Please email us at [email protected] for custom, self-hosted, or finetuned models (or any questions and feedback)!

Quantitative Evaluation

Datasets. We evaluate on 27 retrieval datasets, spanning various topics and corpora, including technical documentation, code, law, finance, web reviews, long documents, medicine, and conversations. Each dataset consists of a corpus to be retrieved from and a set of queries. The corpus typically encompasses documents in a particular domain, such as answers in StackExchange, court opinions, technical documentation, etc., and the queries can be questions, summarization of a long document, or just merely a document.

The following table organizes the evaluation datasets into eight categories, facilitating an easier interpretation of the results.

Category	Descriptions	Datasets
TECH	technical documentation	OneSignal, PyTorch, Verizon 5G, Cohere
CODE	code snippets, docstrings	LeetCode-python, DS1000, codechef-cpp_5doc
LAW	cases, court opinions, statutes, patents	Legalbench-Contracts, Law_Stackexchange, LegalQuad
FINANCE	SEC filings, Finance QA	FinanceBench, ConvFinQA, Fiqa Personal Finance
WEB	reviews, forum posts, policy pages	Doordash, Health4CA, Movie Summary, Kijiji.ca
LONG-CONTEXT	long documents on assorted topics: government reports, academic papers, and dialogues	QMSum, Government Report, Qasper
MEDICAL	Medical documents and QA	Mental Health Consulting, Covid QA, ChatDoctor, Medical Instruction
CONVERSATION	Meeting transcripts, Dialogues	Dialog Sum,QA Conv, MeetingBank-transcript

Method and Metrics. We evaluate the retrieval quality of various rerankers on top of several first-stage search methods in the setup illustrated in Figure 1. Given a query, we first retrieve 100 candidate documents from the corpus using embedding models (voyage-large-2 or OpenAI v3) or lexical search (BM25). Then, we use a reranker (voyage/rerank-lite-1, bge-rerank-large, or cohere/rerank-english-v2.0) to select top-k relevant documents among the candidate documents. We measure recall@k — the rate at which the gold standard document(s) appears among the top-k documents returned by this two-stage retrieval system. Specifically, we choose k=5, aligning with the most common practices in downstream applications.

Results. The radar charts illustrate the recall@5 for different combinations of first-stage search methods and rerankers. Voyage rerank-lite-1 emerges as the consistently superior reranker across all domains and first-stage search methods. Moreover, Voyage rerank-lite-1 improves recall over only a first-stage search in almost all cases—the same cannot be said for other rerankers. In fact, one of the two cases where Voyage rerank-lite-1 does not improve retrieval quality is in the CODE category using voyage-large-2 embeddings, which is known to excel with code data.

Detailed numeric results for all 27 datasets and configurations are available in this spreadsheet.

Try Voyage rerankers!

Take your retrieval to the next level with rerank-lite-1 today! Head over to our quickstart to get going. As a modular component, rerank-lite-1 seamlessly integrates with other parts of your RAG stack. While it’s not necessary to use it with Voyage embedding models, as you would expect, Voyage embedding models and rerank-lite-1 work well together.

If you’re interested in early access to our upcoming domain-specific or finetuning embeddings, we’d love to hear from you and please email [email protected]. Follow us on X (Twitter) and/or LinkedIn for more updates!