kimi-k2-thinking

Model Description

Introducing Kimi K2 Thinking: A Deep-Reasoning Agent with Native INT4 Quantization

Kimi K2 Thinking has been introduced as the latest and most capable version in the series of open-source thinking models. Designed from the ground up as a thinking agent, it specializes in performing step-by-step reasoning while dynamically invoking tools to complete complex tasks. The model sets a new state-of-the-art on benchmarks like Humanity’s Last Exam (HLE) and BrowseComp by significantly extending its multi-step reasoning depth and maintaining stable performance across 200–300 consecutive tool calls.

A key technical highlight of K2 Thinking is its implementation as a native INT4 quantization model, which, combined with a 256k context window, allows for lossless reductions in inference latency and GPU memory usage.

Key Features

  • Deep Thinking & Tool Orchestration: The model is end-to-end trained to interleave chain-of-thought reasoning with function calls. This enables it to handle autonomous research, coding, and writing workflows that can last for hundreds of steps without drifting from the original goal.
  • Stable Long-Horizon Agency: K2 Thinking demonstrates coherent, goal-directed behavior for up to 200–300 consecutive tool invocations, a significant improvement over previous models that often saw performance degrade after 30–50 steps.
  • Native INT4 Quantization: By employing Quantization-Aware Training (QAT) in its post-training stage, the model achieves a nearly 2x speed-up in low-latency mode without sacrificing performance.

Model Architecture

Kimi K2 Thinking is built on a Mixture-of-Experts (MoE) architecture. Its key specifications are as follows:

Specification Value
Architecture Mixture-of-Experts (MoE)
Total Parameters 1T
Activated Parameters 32B
Context Length 256K
Vocabulary Size 160K
Number of Layers 61 (including 1 dense layer)
Number of Experts 384 (8 selected per token, 1 shared)
Attention Mechanism MLA
Activation Function SwiGLU

Performance and Evaluation

Evaluation results show that Kimi K2 Thinking achieves state-of-the-art or highly competitive performance across a range of tasks. In reasoning tasks with tools, it scores 44.9 on HLE and 60.2 on BrowseComp, outperforming other leading models. It also demonstrates strong capabilities in coding, achieving a score of 71.3 on SWE-bench Verified and showing particular strength in multilingual coding benchmarks. All reported benchmark results were achieved using INT4 precision, underscoring the model’s efficiency.

Deployment and Usage

Developers can access Kimi K2 Thinking via an OpenAI/Anthropic-compatible API available at platform.moonshot.ai. For local deployment, the model is optimized to run on inference engines such as vLLM, SGLang, and KTransformers.

The model supports standard chat completion and advanced tool-calling functionalities. Users can define a list of available tools, and the model will autonomously decide when and how to use them to fulfill a request. The recommended temperature setting for general use is 1.0.

License

Both the model weights and the associated code repository are released under the Modified MIT License.

🔔How to Use

graph LR A("Purchase Now") --> B["Start Chat on Homepage"] A --> D["Read API Documentation"] B --> C["Register / Login"] C --> E["Enter Key"] D --> F["Enter Endpoint & Key"] E --> G("Start Using") F --> G style A fill:#f9f9f9,stroke:#333,stroke-width:1px style B fill:#f9f9f9,stroke:#333,stroke-width:1px style C fill:#f9f9f9,stroke:#333,stroke-width:1px style D fill:#f9f9f9,stroke:#333,stroke-width:1px style E fill:#f9f9f9,stroke:#333,stroke-width:1px style F fill:#f9f9f9,stroke:#333,stroke-width:1px style G fill:#f9f9f9,stroke:#333,stroke-width:1px

Purchase Now

Start Chat on Homepage

Register / Login

Enter Key

Read API Documentation

Enter Endpoint & Key

Start Using

Description Ends

Recommend Models

claude-opus-4-1-20250805

Opus 4.1 advances our state-of-the-art coding performance to 74.5% on SWE-bench Verified. It also improves Claude’s in-depth research and data analysis skills, especially around detail tracking and agentic search.

gemini-2.5-pro-preview-06-05

Google has released an upgraded preview of Gemini 2.5 Pro (06-05) that significantly improves coding performance, mathematical reasoning, and response formatting while addressing previous performance concerns.

claude-sonnet-4-5-20250929

Anthropic's Claude Sonnet 4.5 is a new frontier model excelling in programming, agentic tasks, and real-world computer usage, complemented by significant safety upgrades and new developer tools.