A small library for converting tokenized PHP source code into XML
Unsupervised text tokenizer for Neural Network-based text generation
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm
This repo contains the code for 1D tokenizer and generator
Long-form streaming TTS system for multi-speaker dialogue generation
Python library and CLI tool to interface with Google Translate
The best ChatGPT that $100 can buy
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
PostgreSQL extension for full-text search of Chinese language
LLM-based Reinforcement Learning audio edit model
Pre-trained Neural Network models in Axon
Audiocraft is a library for audio processing and generation
The official PyTorch implementation of Google's Gemma models
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Qwen3-Coder is the code version of Qwen3
Unified Multimodal Understanding and Generation Models
Large Language Model Principles and Practice Tutorial from Scratch
Data loaders and abstractions for text and NLP
A plugin that integrates Lucene IK analyzer into elasticsearch
Code for the paper Language Models are Unsupervised Multitask Learners
The regex-centric, fast lexical analyzer generator for C++
Inference code for Llama models
An ecosystem of Rust libraries for working with large language models