Model Tracker

Model	Overall	Best fit	Context	Pricing	Speed	Reasoning	Coding	Access	Verified
Compare Claude Opus 4.7 Anthropic	90.0	Coding Long context Opus is the Claude you use when you want fewer compromises. It is expensive, but it feels built for serious knowledge work rather than casual throughput.	1M tokens	In $5 / MTok Out $25 / MTok	82.0	95.0	95.0	API, Enterprise, Web app	May 2, 2026
Compare Claude Opus 4.8 Anthropic	90.9	Agents Coding Opus 4.8 looks like Anthropic's current premium reliability play: less about raw novelty than making high-autonomy agent work steadier, sharper, and easier to trust.	1M tokens	In $5 / 1M tokens Out $25 / 1M tokens	83.0	96.0	96.0	API, Enterprise, Web app	May 31, 2026
Compare Claude Sonnet 4.6 Anthropic	90.1	Agents Coding Sonnet is often the pragmatic Anthropic choice. It gives you enough intelligence to do real work without immediately making every request feel expensive.	1M tokens	In $3 / MTok Out $15 / MTok	90.0	91.0	92.0	API, Enterprise, Web app	May 2, 2026
Compare DeepSeek V4 Pro DeepSeek	87.4	Coding Long context DeepSeek keeps forcing the market to take price compression seriously. The model matters, but the strategic story is the economics.	1M tokens	In $1.74 / 1M input tokens Out $3.48 / 1M output tokens	88.0	89.0	89.0	API	May 2, 2026
Compare DeepSeek-V4-Flash DeepSeek	84.3	Coding Long context V4 Flash is the budget pressure point in the current tracker set: if the official prices hold, it changes the cost floor for long-context API work.	1M tokens	In $0.14 / 1M tokens cache miss Out $0.28 / 1M tokens	92.0	84.0	85.0	API	May 31, 2026
Compare Devstral 2 Mistral	83.4	Agents Coding Devstral 2 matters because it is aimed directly at the coding-agent stack instead of trying to be another broad general-purpose flagship.	256K tokens	In $0.40 / 1M tokens Out $2.00 / 1M tokens	87.0	82.0	92.0	API, Enterprise	May 12, 2026
Compare Gemini 3.1 Pro Preview Google	90.1	Coding Long context This looks like Google’s real flagship lane now: less about novelty, more about making Gemini feel dependable enough for serious agent and coding work.	1M tokens	In $2.00 / 1M tokens (200k) Out $12.00 / 1M tokens (200k)	85.0	94.0	95.0	API, Enterprise, Web app	May 12, 2026
Compare GPT-5.4 mini OpenAI	87.5	Agents Coding This is the kind of model that actually changes unit economics. It makes a lot of agent patterns viable where flagship pricing still feels heavy.	400K tokens	In $0.75 / 1M tokens Out $4.50 / 1M tokens	92.0	86.0	90.0	API	May 2, 2026
Compare GPT-5.5 OpenAI	90.1	Coding Long context This is the kind of model you reach for when failure is expensive. It looks less like a casual chatbot and more like a premium cognitive engine for serious work.	1.05M tokens	In $5 / 1M tokens Out $30 / 1M tokens	84.0	96.0	97.0	API	May 2, 2026
Compare Grok 4.3 xAI	88.8	Coding Generalist The important change is less the name than the economics: xAI's official docs now position Grok 4.3 as a practical API model instead of a premium curiosity.	1M tokens	In $1.25 / 1M tokens Out $2.50 / 1M tokens	91.0	90.0	89.0	API	May 31, 2026
Compare Mistral Medium 3.5 Mistral	84.0	Coding Generalist Mistral’s role in the market is strategic as much as technical. It gives serious teams another commercial frontier path that is not just a copy of the biggest platforms.	256K tokens	In $1.5 / 1M tokens Out $7.5 / 1M tokens	86.0	84.0	88.0	API, Enterprise	May 2, 2026
Compare Mistral Small 4 Mistral AI	82.4	Agents Coding Small 4 is not the glamour entry, but its hybrid design and low price make it a practical model-tracker candidate for builders watching margins.	256k tokens	In $0.15 / 1M tokens Out $0.60 / 1M tokens	90.0	82.0	86.0	API	May 31, 2026
Compare Qwen3-235B-A22B Alibaba / Qwen	86.1	Coding Multilingual Qwen matters because it keeps the open-weight lane credible at the high end. It is not just a hobbyist release; it is a real option for builders who want control.	128K tokens	Local	76.0	90.0	91.0	API, Local, Open weights, Web app	May 2, 2026
Compare Qwen3.6-Plus Alibaba / Qwen	82.5	Generalist Long context Potential tracker replacement/refresh for Qwen3.5-Plus, but hold for pricing/context details and English/global docs alignment before publishing.	See Alibaba Model Studio docs	In See Alibaba Model Studio pricing Out See Alibaba Model Studio pricing	85.0	86.0	82.0	API	May 16, 2026

Compare

Anthropic Claude Opus 4.7

Coding Long context

Opus is the Claude you use when you want fewer compromises. It is expensive, but it feels built for serious knowledge work rather than casual throughput.

Overall 90.0 Reasoning 95.0 Coding 95.0 1M tokens

Compare

Anthropic Claude Opus 4.8

Agents Coding

Opus 4.8 looks like Anthropic's current premium reliability play: less about raw novelty than making high-autonomy agent work steadier, sharper, and easier to trust.

Overall 90.9 Reasoning 96.0 Coding 96.0 1M tokens

Compare

Anthropic Claude Sonnet 4.6

Agents Coding

Sonnet is often the pragmatic Anthropic choice. It gives you enough intelligence to do real work without immediately making every request feel expensive.

Overall 90.1 Reasoning 91.0 Coding 92.0 1M tokens

Compare

DeepSeek DeepSeek V4 Pro

Coding Long context

DeepSeek keeps forcing the market to take price compression seriously. The model matters, but the strategic story is the economics.

Overall 87.4 Reasoning 89.0 Coding 89.0 1M tokens

Compare

DeepSeek DeepSeek-V4-Flash

Coding Long context

V4 Flash is the budget pressure point in the current tracker set: if the official prices hold, it changes the cost floor for long-context API work.

Overall 84.3 Reasoning 84.0 Coding 85.0 1M tokens

Compare

Mistral Devstral 2

Agents Coding

Devstral 2 matters because it is aimed directly at the coding-agent stack instead of trying to be another broad general-purpose flagship.

Overall 83.4 Reasoning 82.0 Coding 92.0 256K tokens

Compare

Google Gemini 3.1 Pro Preview

Coding Long context

This looks like Google’s real flagship lane now: less about novelty, more about making Gemini feel dependable enough for serious agent and coding work.

Overall 90.1 Reasoning 94.0 Coding 95.0 1M tokens

Compare

OpenAI GPT-5.4 mini

Agents Coding

This is the kind of model that actually changes unit economics. It makes a lot of agent patterns viable where flagship pricing still feels heavy.

Overall 87.5 Reasoning 86.0 Coding 90.0 400K tokens

Compare

OpenAI GPT-5.5

Coding Long context

This is the kind of model you reach for when failure is expensive. It looks less like a casual chatbot and more like a premium cognitive engine for serious work.

Overall 90.1 Reasoning 96.0 Coding 97.0 1.05M tokens

Compare

xAI Grok 4.3

Coding Generalist

The important change is less the name than the economics: xAI's official docs now position Grok 4.3 as a practical API model instead of a premium curiosity.

Overall 88.8 Reasoning 90.0 Coding 89.0 1M tokens

Compare

Mistral Mistral Medium 3.5

Coding Generalist

Mistral’s role in the market is strategic as much as technical. It gives serious teams another commercial frontier path that is not just a copy of the biggest platforms.

Overall 84.0 Reasoning 84.0 Coding 88.0 256K tokens

Compare

Mistral AI Mistral Small 4

Agents Coding

Small 4 is not the glamour entry, but its hybrid design and low price make it a practical model-tracker candidate for builders watching margins.

Overall 82.4 Reasoning 82.0 Coding 86.0 256k tokens

Compare

Alibaba / Qwen Qwen3-235B-A22B

Coding Multilingual

Qwen matters because it keeps the open-weight lane credible at the high end. It is not just a hobbyist release; it is a real option for builders who want control.

Overall 86.1 Reasoning 90.0 Coding 91.0 128K tokens

Compare

Alibaba / Qwen Qwen3.6-Plus

Generalist Long context

Potential tracker replacement/refresh for Qwen3.5-Plus, but hold for pricing/context details and English/global docs alignment before publishing.

Overall 82.5 Reasoning 86.0 Coding 82.0 See Alibaba Model Studio docs

Fast picks for common decisions

Compare practical model fit