YouTokenToMe is a fast and efficient unsupervised text tokenization library designed for training subword embeddings, particularly useful for NLP models.
Features
- Implements Byte Pair Encoding (BPE) and Unigram language models
- Optimized for processing large text corpora
- Provides a lightweight and fast tokenization pipeline
- Supports vocabulary pruning and model compression
- Works with Unicode and multilingual text inputs
Categories
Natural Language Processing (NLP)License
MIT LicenseFollow YouTokenToMe
Other Useful Business Software
Try Google Cloud Risk-Free With $300 in Credit
Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of YouTokenToMe!