MiniMax-M1 is presented as the world’s first open-weight, large-scale hybrid-attention reasoning model, designed to push the frontier of long-context, tool-using, and deeply “thinking” language models. It is built on the MiniMax-Text-01 foundation and keeps the same massive parameter budget, but reworks the attention and training setup for better reasoning and test-time compute scaling. Architecturally, it combines Mixture-of-Experts layers with lightning attention, enabling the model to support a native context length of 1 million tokens while using far fewer FLOPs than comparable reasoning models for very long generations. The team emphasizes efficient scaling of test-time compute: at 100K-token generation lengths, M1 reportedly uses only about 25 percent of the FLOPs of some competing models, making extended “think step” traces more feasible. M1 is further trained with large-scale reinforcement learning over diverse tasks.
Features
- Open-weight hybrid-attention reasoning model built atop the MiniMax-Text-01 architecture
- Mixture-of-Experts plus lightning attention for 1M-token native context with efficient FLOP usage
- Large-scale reinforcement learning training spanning math, coding, and sandboxed real-world tasks
- CISPO RL algorithm that clips importance-sampling weights, designed for stable large-scale RL
- Multiple variants with different “thinking budgets” such as 40K and 80K tokens for extended reasoning traces
- Strong benchmark performance on software engineering, tool-use, and long-context reasoning compared to other open models