Qwen2.5-VL-72B-Instruct

2025-01-28
对话, 识图
By Qwen

Input: ￥6.00 / M tokens Output: ￥12.00 / M tokens
特征：图像输入, 流式, 文本输入, 文本输出
上下文： 128K

Input: ￥6.00 / M tokens Output: ￥12.00 / M tokens
特征：图像输入, 流式, 文本输入, 文本输出
上下文： 128K

模型描述

Qwen2.5-VL-72B-Instruct represents a significant upgrade in the Qwen family of vision-language models. Building on feedback from developers, it excels in visual recognition (objects, text, charts, layouts), acts as a visual agent for tool-based reasoning, and processes long videos (1+ hours) with precise event localization. It supports object detection via bounding boxes/points and generates structured outputs (e.g., invoices, tables) for finance/commerce. Architectural improvements include dynamic FPS training for video understanding, optimized ViT with window attention/SwiGLU, and temporal mRoPE enhancements. Available in 3B/7B/72B variants, this 72B instruction-tuned model balances speed and performance.

🔔如何使用

graph LR A("Purchase Now") --> B["Start Chat on Homepage"] A --> D["Read API Documentation"] B --> C["Register / Login"] C --> E["Enter Key"] D --> F["Enter Endpoint & Key"] E --> G("Start Using") F --> G style A fill:#f9f9f9,stroke:#333,stroke-width:1px style B fill:#f9f9f9,stroke:#333,stroke-width:1px style C fill:#f9f9f9,stroke:#333,stroke-width:1px style D fill:#f9f9f9,stroke:#333,stroke-width:1px style E fill:#f9f9f9,stroke:#333,stroke-width:1px style F fill:#f9f9f9,stroke:#333,stroke-width:1px style G fill:#f9f9f9,stroke:#333,stroke-width:1px

点击购买

点击首页立即对话

输入key

阅读API文档

输入端点和API Key

开始使用

推荐模型

claude-opus-4-20250514-thinking