Domain-Specific Embeddings: Finance Edition (voyage-finance-2)

TL;DR – We are thrilled to launch our finance domain-specific embedding model voyage-finance-2, which demonstrates superior finance retrieval quality and outperformed competing models on financial retrieval datasets, with an average of 7% gain over OpenAI (the next best model) and 12% over Cohere. voyage-finance-2 supports a 32K context length-much larger than the other evaluated alternatives. voyage-finance-2 is latest addition to our domain-specific embedding model portfolio, which includes voyage-law-2 (legal retrieval) and voyage-code-2 (code retrieval).

Domain-specific customization of embedding models is key to solving challenging retrieval problems. Embedding models typically have no more than 10 billion parameters due to latency constraints. Therefore, rationing the parameter capacity to specific domains is necessary and sufficient to achieve excellent performance in those areas. Our past domain-specific embedding models enhance retrieval accuracy significantly — boosting response quality in Gen AI applications in expertise-intensive domains, such as code and law. Today, we’re excited to launch and add voyage-fiannce-2— optimized for finance retrieval— to our portfolio of cutting-edge domain-specific embedding models.

Quantitative Evaluation

Datasets. We evaluate voyage-finance-2 on 11 finance retrieval datasets spanning financial news, public filings, finance advice, and financial reports. These datasets are not seen during training. Most notably, we evaluate on TAT-QA, a large-scale question-answering dataset requiring some numerical reasoning over a hybrid of tabular and textual data. The following table provides a summary of the datasets.

Dataset	Descriptions
Trade-the-event	Corporate event news and summary
RAG benchmark (Apple-10K-2022)	Questions about publicly traded companies and relevant public filings
FinanceBench	Questions about publicly traded companies and relevant public filings
TAT-QA	Questions on a hybrid of tabular and textual content in finance
Finance Alpaca	Finance advice question-answering
FIQA Personal Finance	Questions and answers about personal finance
Stock News Sentiments	Corporate event news and summary
ConvFinQA	Question-answer pairs over financial reports
FinQA	Question-answer pairs over financial reports
News stocks	Finance news and summary
HC3 finance	Finance advice question-answering

Models and Metrics. We evaluate voyage-finance-2 and three other baselines—Mistral (mistral-embed), OpenAI v3 large (text-embedding-3-large), and Cohere English v3 (embed-english-v3.0). Given a query, we retrieve the top-10 documents based on cosine similarities and report the normalized discounted cumulative gain (NDCG@10), a standard metric for retrieval quality and a variant of the recall.

Results. The following table lists the NDCG@10 for each dataset.

Dataset	`voyage-finance-2`	Mistral	OpenAI v3 large	Cohere English v3
Trade-the-event	0.993	0.992	0.988	0.991
RAG benchmark (Apple-10K-2022)	0.948	0.948	0.947	0.941
FinanceBench	0.853	0.776	0.836	0.753
TAT-QA	0.788	0.609	0.701	0.683
Finance Alpaca	0.786	0.734	0.759	0.678
FIQA Personal Finance	0.775	0.774	0.761	0.647
Stock News Sentiments	0.846	0.836	0.833	0.797
ConvFinQA	0.820	0.481	0.550	0.551
FinQA	0.795	0.469	0.537	0.506
News stocks	0.843	0.842	0.810	0.792
HC3 finance	0.690	0.674	0.659	0.508
Average	0.831	0.740	0.762	0.713

voyage-fiance-2 is the top performing model across all of the evaluation datasets, with an average of 7% gain over OpenAI (the next best model) and 12% better than Cohere. Also, at 32K, the context length of voyage-finance-2 is much larger than the other evaluated models— Mistral and OpenAI v3 large at 8K and Cohere English v3 at 512.

Try voyage-finance-2!

Domain-specific embedding models have been shown to enhance the retrieval quality significantly for their domains. Now, with voyage-finance-2, you can turbo charge your Gen AI applications with finance retrieval. If you have used other Voyage embeddings, you just need to specify voyage-finance-2 as the model parameter (for both the corpus and queries). Head over to our docs to learn more. We can’t wait to see what domain-specific applications you build with these embeddings models!

If you’re interested in early access to more upcoming domain-specific or finetuning embeddings, we’d love to hear from you and please email [email protected]. Follow us on X (Twitter) and LinkedIn for more updates!