qwen3-30b-a3b

Qwen3 represents the latest generation in the Qwen series of large language models, offering a comprehensive suite of both dense and mixture-of-experts (MoE) models. Leveraging extensive training, Qwen3 introduces unprecedented advancements in reasoning, instruction following, agent capabilities, and multilingual support. Its key features include:

Seamless Mode Switching: The model uniquely supports smooth transitions between “thinking” mode (for complex logical reasoning, mathematics, and coding) and “non-thinking” mode (for efficient, general-purpose dialogue), ensuring optimal performance across a variety of scenarios.
Enhanced Reasoning: Qwen3 demonstrates significantly improved reasoning abilities, outperforming previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) in mathematics, code generation, and commonsense logical reasoning tasks.
Human Preference Alignment: The model excels in creative writing, role-playing, multi-turn conversations, and instruction following, delivering a natural, engaging, and immersive conversational experience.
Agent Proficiency: Qwen3 offers advanced agent capabilities, enabling precise integration with external tools in both thinking and non-thinking modes, and achieves leading performance among open-source models on complex agent-based tasks.
Multilingual Support: It supports over 100 languages and dialects, showcasing strong capabilities for multilingual instruction following and translation.
Model Details

Below is an overview of the FP8 version of Qwen3-30B-A3B:

Feature	Specification
Type	Causal Language Models
Training Stage	Pretraining & Post-training
Number of Parameters (Total)	30.5B
Number of Activated Parameters	3.3B
Number of Parameters (Non-Embedding)	29.9B
Number of Layers	48
Number of Attention Heads (GQA)	32 for Q, 4 for KV
Number of Experts	128
Number of Activated Experts	8
Context Length	32,768 tokens natively; 131,072 tokens with YaRN

Model Description