MTEB download | SourceForge.net

Text embeddings are commonly evaluated on a small set of datasets from a single task not covering their possible applications to other tasks. It is unclear whether state-of-the-art embeddings on semantic textual similarity (STS) can be equally well applied to other tasks like clustering or reranking. This makes progress in the field difficult to track, as various models are constantly being proposed without proper evaluation. To solve this problem, we introduce the Massive Text Embedding Benchmark (MTEB). MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages. Through the benchmarking of 33 models on MTEB, we establish the most comprehensive benchmark of text embeddings to date. We find that no particular text embedding method dominates across all tasks. This suggests that the field has yet to converge on a universal text embedding method and scale it up sufficiently to provide state-of-the-art results on all embedding tasks.

Features

Dataset selection
Datasets can be selected by providing the list of datasets
You can also specify which languages to load for multilingual/crosslingual tasks
You can evaluate only on test splits of all tasks
Use a custom model
Evaluate on a custom task

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow MTEB

MTEB Web Site

Other Useful Business Software

Train ML Models With SQL You Already Know

BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free

Rate This Project

User Reviews

Be the first to post a review of MTEB!

Additional Project Details

Programming Language

Python

Related Categories

Python Neural Search Software

Registered

2023-08-21

Similar Business Software

Cohere

Cohere is an enterprise AI platform that enables developers and businesses to build powerful language-based applications. Specializing in large language models (LLMs), Cohere provides solutions for text generation, summarization, and semantic search. Their model offerings include the Command...

See Software
Zeta Alpha

Zeta Alpha is the best Neural Discovery Platform for AI and beyond. Use state-of-the-art Neural Search to improve how you and your team discover, organize and share knowledge. Make better decisions, avoid reinventing the wheel, and make staying in the know effortless: the power of modern AI to...

See Software
Vespa

Vespa is forBig Data + AI, online. At any scale, with unbeatable performance. To build production-worthy online applications that combine data and AI, you need more than point solutions: You need a platform that integrates data and compute to achieve true scalability and availability - and...

See Software
Qdrant

Qdrant is a vector similarity engine & vector database. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much...

See Software
Embedditor

Improve your embedding metadata and embedding tokens with a user-friendly UI. Seamlessly apply advanced NLP cleansing techniques like TF-IDF, normalize, and enrich your embedding tokens, improving efficiency and accuracy in your LLM-related applications. Optimize the relevance of the content you...

See Software
Jina Search

With Jina Search, you can search for anything in seconds - faster and more accurately than any traditional search engine. Our AI search captures all the information stored in images and text, providing you with the most comprehensive results. Unlock the power of search and revolutionize the way...

See Software