Download Latest Version v0.17.4 source code.tar.gz (7.2 MB)
Email in envelope

Get an email when there's a new version of BERTopic

Home / v0.17.0
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2025-03-19 4.2 kB
v0.17.0 source code.tar.gz 2025-03-19 7.2 MB
v0.17.0 source code.zip 2025-03-19 7.3 MB
Totals: 3 Items   14.6 MB 0

Highlights:

Fixes:

2025-03-1916-30-14online-video-cutter com-ezgif com-optimize (2)

Model2Vec

With Model2Vec, we now have a very interesting pipeline for light-weight embeddings. Combined with the light-weight installation, you can now run BERTopic without using pytorch!

Installation is straightforward:

pip install --no-deps bertopic
pip install --upgrade numpy pandas scikit-learn tqdm plotly pyyaml

This will install BERTopic even without UMAP or HDBSCAN, so you can use other techniques instead. If these are not installed, then it uses PCA with scikit-learn's HDBSCAN instead. You can install them, together with Model2Vec:

pip install model2vec umap-learn hdbscan

Then, creating a BERTopic model is as straightforward as you are used to:

:::python
from bertopic import BERTopic
from model2vec import StaticModel

# Model2Vec
embedding_model = StaticModel.from_pretrained("minishlab/potion-base-8M")

# BERTopic
topic_model = BERTopic(embedding_model=embedding_model)

DataMapPlot

To use the interactive version of DataMapPlot, you only need to run the following:

:::python
from umap import UMAP

# Reduce your embeddings to 2-dimensions
reduced_embeddings = UMAP(n_neighbors=10, n_components=2, min_dist=0.0, metric='cosine').fit_transform(embeddings)

# Create an interactive DataMapPlot figure
topic_model.visualize_document_datamap(docs, reduced_embeddings=reduced_embeddings, interactive=True
Source: README.md, updated 2025-03-19