qwen3-30b-a3b

Model Description

Qwen3 represents the latest generation in the Qwen series of large language models, offering a comprehensive suite of both dense and mixture-of-experts (MoE) models. Leveraging extensive training, Qwen3 introduces unprecedented advancements in reasoning, instruction following, agent capabilities, and multilingual support. Its key features include:

Seamless Mode Switching: The model uniquely supports smooth transitions between “thinking” mode (for complex logical reasoning, mathematics, and coding) and “non-thinking” mode (for efficient, general-purpose dialogue), ensuring optimal performance across a variety of scenarios.
Enhanced Reasoning: Qwen3 demonstrates significantly improved reasoning abilities, outperforming previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) in mathematics, code generation, and commonsense logical reasoning tasks.
Human Preference Alignment: The model excels in creative writing, role-playing, multi-turn conversations, and instruction following, delivering a natural, engaging, and immersive conversational experience.
Agent Proficiency: Qwen3 offers advanced agent capabilities, enabling precise integration with external tools in both thinking and non-thinking modes, and achieves leading performance among open-source models on complex agent-based tasks.
Multilingual Support: It supports over 100 languages and dialects, showcasing strong capabilities for multilingual instruction following and translation.
Model Details

Below is an overview of the FP8 version of Qwen3-30B-A3B:

Feature Specification
Type Causal Language Models
Training Stage Pretraining & Post-training
Number of Parameters (Total) 30.5B
Number of Activated Parameters 3.3B
Number of Parameters (Non-Embedding) 29.9B
Number of Layers 48
Number of Attention Heads (GQA) 32 for Q, 4 for KV
Number of Experts 128
Number of Activated Experts 8
Context Length 32,768 tokens natively; 131,072 tokens with YaRN

🔔How to Use

graph LR A("Purchase Now") --> B["Start Chat on Homepage"] A --> D["Read API Documentation"] B --> C["Register / Login"] C --> E["Enter Key"] D --> F["Enter Endpoint & Key"] E --> G("Start Using") F --> G style A fill:#f9f9f9,stroke:#333,stroke-width:1px style B fill:#f9f9f9,stroke:#333,stroke-width:1px style C fill:#f9f9f9,stroke:#333,stroke-width:1px style D fill:#f9f9f9,stroke:#333,stroke-width:1px style E fill:#f9f9f9,stroke:#333,stroke-width:1px style F fill:#f9f9f9,stroke:#333,stroke-width:1px style G fill:#f9f9f9,stroke:#333,stroke-width:1px

Purchase Now

Start Chat on Homepage

Register / Login

Enter Key

Read API Documentation

Enter Endpoint & Key

Start Using

Description Ends

Recommend Models

gemini-2.5-flash-image-preview-bs(nano-banana)

Gemini 2.5 Flash Image is a state-of-the-art model for image generation and editing that offers advanced capabilities like character consistency, natural language-based transformations, multi-image fusion, and the integration of Gemini's world knowledge.

claude-opus-4-1-20250805

Opus 4.1 advances our state-of-the-art coding performance to 74.5% on SWE-bench Verified. It also improves Claude’s in-depth research and data analysis skills, especially around detail tracking and agentic search.

gpt-4.1-nano-2025-04-14

GPT-4.1 nano is the fastest, most cost-effective GPT-4.1 model.