Janus-Pro-7B
Janus-Pro-7B is an innovative open-source multimodal AI model from DeepSeek, designed to excel in both understanding and generating content across text, images, and videos. It leverages a unique autoregressive architecture with separate pathways for visual encoding, enabling high performance in tasks ranging from text-to-image generation to complex visual comprehension. This model outperforms competitors like DALL-E 3 and Stable Diffusion in various benchmarks, offering scalability with versions from 1 billion to 7 billion parameters. Licensed under the MIT License, Janus-Pro-7B is freely available for both academic and commercial use, providing a significant leap in AI capabilities while being accessible on major operating systems like Linux, MacOS, and Windows through Docker.
Learn more
VideoPoet
VideoPoet is a simple modeling method that can convert any autoregressive language model or large language model (LLM) into a high-quality video generator. It contains a few simple components. An autoregressive language model learns across video, image, audio, and text modalities to autoregressively predict the next video or audio token in the sequence. A mixture of multimodal generative learning objectives are introduced into the LLM training framework, including text-to-video, text-to-image, image-to-video, video frame continuation, video inpainting and outpainting, video stylization, and video-to-audio. Furthermore, such tasks can be composed together for additional zero-shot capabilities. This simple recipe shows that language models can synthesize and edit videos with a high degree of temporal consistency.
Learn more
Mercury Coder
Mercury, the latest innovation from Inception Labs, is the first commercial-scale diffusion large language model (dLLM), offering a 10x speed increase and significantly lower costs compared to traditional autoregressive models. Built for high-performance reasoning, coding, and structured text generation, Mercury processes over 1000 tokens per second on NVIDIA H100 GPUs, making it one of the fastest LLMs available. Unlike conventional models that generate text one token at a time, Mercury refines responses using a coarse-to-fine diffusion approach, improving accuracy and reducing hallucinations. With Mercury Coder, a specialized coding model, developers can experience cutting-edge AI-driven code generation with superior speed and efficiency.
Learn more
ByteDance Seed
Seed Diffusion Preview is a large-scale, code-focused language model that uses discrete-state diffusion to generate code non-sequentially, achieving dramatically faster inference without sacrificing quality by decoupling generation from the token-by-token bottleneck of autoregressive models. It combines a two-stage curriculum, mask-based corruption followed by edit-based augmentation, to robustly train a standard dense Transformer, striking a balance between speed and accuracy and avoiding shortcuts like carry-over unmasking to preserve principled density estimation. The model delivers an inference speed of 2,146 tokens/sec on H20 GPUs, outperforming contemporary diffusion baselines while matching or exceeding their accuracy on standard code benchmarks, including editing tasks, thereby establishing a new speed-quality Pareto frontier and demonstrating discrete diffusion’s practical viability for real-world code generation.
Learn more