Qwen3-235B-A22B, part of the cutting-edge Qwen3 series, introduces the next generation of large language models with both dense and mixture-of-experts (MoE) architectures. This advanced model enables seamless switching between thinking mode (for complex reasoning, math, and coding) and non-thinking mode (for fast, general-purpose dialogue) within a single model—delivering optimal performance for a wide range of applications.
Key highlights:
Outperforms previous generation QwQ models (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) in mathematics, code generation, and commonsense logical reasoning.
Offers superior alignment with human preferences, excelling in creative writing, role-play, multi-turn dialogue, and following instructions for highly natural and engaging conversations.
Exceptional agent capabilities allow precise external tool integration in both thinking and non-thinking modes, achieving SOTA (state-of-the-art) performance among open-source models for complex agent-based tasks.
Supports over 100 languages and dialects, with robust multilingual instruction following and translation abilities.
Model overview:
Feature | Description |
---|---|
Type | Causal Language Model |
Training Stage | Pretraining & Post-training |
Number of Parameters (Total) | 235B |
Activated Parameters | 22B |
Non-Embedding Parameters | 234B |
Layers | 94 |
Attention Heads (GQA) | Q: 64, KV: 4 |
Number of Experts | 128 |
Activated Experts | 8 |
Context Length | 32,768 tokens natively, up to 131,072 tokens with YaRN |
Qwen3-235B-A22B sets a new standard in the industry for reasoning, agent capabilities, human-like dialogue, and multilingual support, making it an ideal choice for sophisticated AI applications.