Qwen3 represents the latest generation in the Qwen series of large language models, offering a comprehensive suite of both dense and mixture-of-experts (MoE) models. Leveraging extensive training, Qwen3 introduces unprecedented advancements in reasoning, instruction following, agent capabilities, and multilingual support. Its key features include:
Seamless Mode Switching: The model uniquely supports smooth transitions between “thinking” mode (for complex logical reasoning, mathematics, and coding) and “non-thinking” mode (for efficient, general-purpose dialogue), ensuring optimal performance across a variety of scenarios.
Enhanced Reasoning: Qwen3 demonstrates significantly improved reasoning abilities, outperforming previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) in mathematics, code generation, and commonsense logical reasoning tasks.
Human Preference Alignment: The model excels in creative writing, role-playing, multi-turn conversations, and instruction following, delivering a natural, engaging, and immersive conversational experience.
Agent Proficiency: Qwen3 offers advanced agent capabilities, enabling precise integration with external tools in both thinking and non-thinking modes, and achieves leading performance among open-source models on complex agent-based tasks.
Multilingual Support: It supports over 100 languages and dialects, showcasing strong capabilities for multilingual instruction following and translation.
Model Details
Below is an overview of the FP8 version of Qwen3-30B-A3B:
Feature | Specification |
---|---|
Type | Causal Language Models |
Training Stage | Pretraining & Post-training |
Number of Parameters (Total) | 30.5B |
Number of Activated Parameters | 3.3B |
Number of Parameters (Non-Embedding) | 29.9B |
Number of Layers | 48 |
Number of Attention Heads (GQA) | 32 for Q, 4 for KV |
Number of Experts | 128 |
Number of Activated Experts | 8 |
Context Length | 32,768 tokens natively; 131,072 tokens with YaRN |