LlamaGen is an open-source research project that introduces a new approach to image generation by applying the autoregressive next-token prediction paradigm used in large language models to visual generation tasks. Instead of relying on diffusion models, the framework treats images as sequences of tokens that can be generated progressively using transformer architectures similar to those used for text generation. The project explores how scaling autoregressive models and improving image tokenization techniques can produce competitive results compared with modern diffusion-based image generators. LlamaGen provides several pre-trained models and training configurations that support both class-conditional image generation and text-conditioned image synthesis. The repository includes image tokenizers, training scripts, and models ranging from hundreds of millions to several billion parameters.
Features
- Autoregressive image generation based on large language model architectures
- Image tokenization systems for converting visual data into model tokens
- Support for class-conditional and text-conditional image generation models
- Pretrained models ranging from hundreds of millions to billions of parameters
- Training and experimentation framework for visual generation research
- Integration with high-performance inference frameworks for faster generation