Methodology

All benchmark data is independently measured by Artificial Analysis. Pricing data comes from OpenRouter.

🧠

Intelligence Index

A composite score aggregating 10 independent evaluations including GPQA Diamond, MMLU-Pro, HLE, AIME 2025, LiveCodeBench, SciCode, IFBench, AA-LCR, AA-Omniscience, and Terminal-Bench Hard. Higher is better.

💻

Coding Index

Evaluated on LiveCodeBench (programming problems from LeetCode, AtCoder, Codeforces) and SciCode (Python for scientific computing). Measures real-world software engineering capability.

📐

Math Index

Evaluated on AIME 2025 (30 problems from American Invitational Mathematics Examination) and MATH-500. Tests olympiad-level mathematical reasoning.

⚡

Output Speed

Median output tokens per second, measured with a medium-length prompt across multiple providers. Represents real-world user experience.

💰

Price per 1M Tokens

Input and output token prices from OpenRouter's public API. Blended price = (input × 3 + output × 1) / 4, representing a typical 3:1 input/output ratio.

🏆

Value Score

Intelligence Index divided by blended price per 1M tokens. Higher = better performance per dollar. Models with exceptional value scores (>800) earn the 'Best Value' badge.

🔄

Update Frequency

Data is refreshed automatically via GitHub Actions every Monday and Thursday at 06:00 UTC. Performance benchmarks from Artificial Analysis are updated when new evaluations are published.

Attribution

Intelligence and performance data: Artificial Analysis (artificialanalysis.ai)
Pricing data: OpenRouter (openrouter.ai/api/v1/models)