LLM leaderboard

Compare models like leaderboard sites — on static GitHub Pages

Layout and chart idioms echo Together AI’s Which LLM cards, Onyx’s LLM leaderboard. Numbers below are illustrative placeholders for UI development; replace with your own dataset or API when wiring a real desk. GitHub-style model tester mock → · Liquid AI live tuning →

Model tester (Compare layout)

This block is a static visual echo of the GitHub Models Compare experience described in the May 2025 changelog (side-by-side runs, scores, token/latency readouts, JSON output). On GitHub you enable Models for your repository, version prompts (for example .prompt.yml), then evaluate against hosted models — not something this static Train site can call without your own API keys and a backend.

erika-market / Add categories to transaction prompt #6233

Use-case router

The best LLMs for your use case

WhichLLM-style picks rendered inside the compare grid: choose a provider version, check speed, intelligence, inputs, JSON/tool support, then jump into the matching tester or local Liquid AI loop.

Arena Onyx Vellum LLM Stats Artificial Analysis

Open Liquid AI tuning

Claude Opus 4.6

Anthropic

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: private · Updated: 2026-02

Versions

Speed ⚡····

Intelligence ★★★★★

Price (1M) $54.00

Inputs

TextImage

JSON Mode ✓

Function Calling ✓

Benchmarks

#AA Intelligence indexArtificial Analysis style 57

#AG Agent fitAgent workflow 96

#RT Routing fitModel router 92

Original main Size: private · Updated: 2025-05 90.00%

Input: 842 · Output: 186 · 920 ms

Correct categorization: 0.90
Confidence calibration: 0.82
JSON validation: pass

Output

										
											{
  "categories": [
    { "name": "Groceries", "confidence": 0.91 },
    { "name": "Transport", "confidence": 0.88 }
  ]
}

Try it out

Gemini 3.1 Pro

Google

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: private · Updated: 2026-03

Versions

Speed ⚡⚡···

Intelligence ★★★★★

Price (1M) $8.50

Inputs

TextImageAudioVideo

JSON Mode ✓

Function Calling ✓

Benchmarks

#AA Intelligence indexArtificial Analysis style 57

#AG Agent fitAgent workflow 93

#RT Routing fitModel router 91

Version 1 Size: MoE / Scout · Updated: 2025-04 50.55%

Input: 842 · Output: 204 · 1410 ms

Correct categorization: 0.30
Confidence calibration: 0.55
JSON validation: pass

Output

										
											{
  "categories": [
    { "name": "Culture", "confidence": 0.30 },
    { "name": "Transport", "confidence": 0.61 }
  ]
}

Try it out

GPT-5.4 (xhigh)

OpenAI

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: private · Updated: 2026-04

Versions

Speed ⚡⚡···

Intelligence ★★★★★

Price (1M) $10.63

Inputs

TextImageAudio

JSON Mode ✓

Function Calling ✓

Benchmarks

#AA Intelligence indexArtificial Analysis style 57

#AG Agent fitAgent workflow 95

#RT Routing fitModel router 94

Version 2 Size: private · Updated: 2025-10 77.60%

Input: 842 · Output: 178 · 680 ms

Correct categorization: 0.72
Confidence calibration: 0.79
JSON validation: pass

Output

										
											{
  "categories": [
    { "name": "Groceries", "confidence": 0.85 },
    { "name": "Transport", "confidence": 0.77 }
  ]
}

Try it out

DeepSeek V3.2

DeepSeek

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: 685B MoE · Updated: 2026-01

Versions

Speed ⚡····

Intelligence ★★★★☆

Price (1M) $0.37

Inputs

Text

JSON Mode ✓

Function Calling ✓

Benchmarks

#AA Intelligence indexArtificial Analysis style 52

#AG Agent fitAgent workflow 86

#RT Routing fitModel router 84

Industry blend Size: 685B MoE · Updated: 2026-01 86%

Input: 1,000 · Output: 77 · 938 ms

Routing fit: 84
Agent fit: 86
JSON / tools: json · tools

Output

										
											{
  "model": "DeepSeek V3.2",
  "provider": "DeepSeek",
  "route": "chat",
  "modalities": [
    "Text"
  ],
  "price_per_m": {
    "input": 0.28,
    "output": 0.42
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}

Try it out

Grok 4.3

xAI

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: private · Updated: 2026-03

Versions

Speed ⚡⚡⚡⚡·

Intelligence ★★★★☆

Price (1M) $10.80

Inputs

TextImage

JSON Mode ✓

Function Calling ✓

Benchmarks

#AA Intelligence indexArtificial Analysis style 53

#AG Agent fitAgent workflow 88

#RT Routing fitModel router 86

Industry blend Size: private · Updated: 2026-03 88%

Input: 1,000 · Output: 228 · 316 ms

Routing fit: 86
Agent fit: 88
JSON / tools: json · tools

Output

										
											{
  "model": "Grok 4.3",
  "provider": "xAI",
  "route": "chat",
  "modalities": [
    "Text",
    "Image"
  ],
  "price_per_m": {
    "input": 3,
    "output": 15
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}

Try it out

LiquidAI LFM2 Transcript

Liquid AI

Local/private transcript summarization and caption brief generation.

Size: 2.6B / 1.6GB Q4 · Updated: 2026-05

Versions

Speed ⚡····

Intelligence ★★★☆☆

Price (1M) $0.00

Inputs

TextTranscript

JSON Mode ×

Function Calling ×

Benchmarks

#AA Intelligence indexArtificial Analysis style 31

#AG Agent fitAgent workflow 68

#RT Routing fitModel router 79

Industry blend Size: 2.6B / 1.6GB Q4 · Updated: 2026-05 68%

Input: 1,000 · Output: 128 · 0 ms

Routing fit: 79
Agent fit: 68
JSON / tools: freeform · no tools

Output

										
											{
  "model": "LiquidAI LFM2 Transcript",
  "provider": "Liquid AI / local GGUF",
  "route": "transcribe",
  "modalities": [
    "Text",
    "Transcript"
  ],
  "price_per_m": {
    "input": 0,
    "output": 0
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}

Try it out

gpt-oss-120B (high)

OpenAI

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: 117B / MXFP4 · Updated: 2025-08

Versions

Speed ⚡⚡⚡⚡⚡

Intelligence ★★★☆☆

Price (1M) $0.44

Inputs

Text

JSON Mode ✓

Function Calling ✓

Benchmarks

#AA Intelligence indexArtificial Analysis style 33

#AG Agent fitAgent workflow 83

#RT Routing fitModel router 88

Industry blend Size: 117B / MXFP4 · Updated: 2025-08 83%

Input: 1,000 · Output: 274 · 263 ms

Routing fit: 88
Agent fit: 83
JSON / tools: json · tools

Output

										
											{
  "model": "gpt-oss-120B (high)",
  "provider": "OpenAI",
  "route": "chat",
  "modalities": [
    "Text"
  ],
  "price_per_m": {
    "input": 0.15,
    "output": 0.6
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}

Try it out

Llama 4 Maverick

Qwen3.6-Plus

Alibaba

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: private · Updated: 2026-02

Versions

Speed ⚡⚡⚡··

Intelligence ★★★★☆

Price (1M) $2.13

Inputs

TextImage

JSON Mode ✓

Function Calling ✓

Benchmarks

#AA Intelligence indexArtificial Analysis style 51

#AG Agent fitAgent workflow 85

#RT Routing fitModel router 87

Industry blend Size: private · Updated: 2026-02 85%

Input: 1,000 · Output: 186 · 387 ms

Routing fit: 87
Agent fit: 85
JSON / tools: json · tools

Output

										
											{
  "model": "Qwen3.6-Plus",
  "provider": "Alibaba / Qwen",
  "route": "chat",
  "modalities": [
    "Text",
    "Image"
  ],
  "price_per_m": {
    "input": 0.5,
    "output": 3
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}

Try it out

Kimi K2.6

Moonshot AI

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: private · Updated: 2026-01

Versions

Speed ⚡⚡···

Intelligence ★★★★☆

Price (1M) $3.35

Inputs

Text

JSON Mode ✓

Function Calling ✓

Benchmarks

#AA Intelligence indexArtificial Analysis style 50

#AG Agent fitAgent workflow 82

#RT Routing fitModel router 84

Industry blend Size: private · Updated: 2026-01 82%

Input: 1,000 · Output: 134 · 536 ms

Routing fit: 84
Agent fit: 82
JSON / tools: json · tools

Output

										
											{
  "model": "Kimi K2.6",
  "provider": "Moonshot AI",
  "route": "chat",
  "modalities": [
    "Text"
  ],
  "price_per_m": {
    "input": 1.2,
    "output": 4.5
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}

Try it out

GLM-5.1

Zhipu AI

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: private · Updated: 2026-02

Versions

Speed ⚡⚡⚡··

Intelligence ★★★★☆

Price (1M) $3.35

Inputs

TextImage

JSON Mode ✓

Function Calling ✓

Benchmarks

#AA Intelligence indexArtificial Analysis style 49

#AG Agent fitAgent workflow 81

#RT Routing fitModel router 83

Industry blend Size: private · Updated: 2026-02 81%

Input: 1,000 · Output: 151 · 476 ms

Routing fit: 83
Agent fit: 81
JSON / tools: json · tools

Output

										
											{
  "model": "GLM-5.1",
  "provider": "Zhipu AI",
  "route": "chat",
  "modalities": [
    "Text",
    "Image"
  ],
  "price_per_m": {
    "input": 1.4,
    "output": 4.4
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}

Try it out

MiniMax M2.7

MiniMax

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: private · Updated: 2026-01

Versions

Speed ⚡⚡⚡⚡·

Intelligence ★★★★☆

Price (1M) $0.89

Inputs

TextAudio

JSON Mode ✓

Function Calling ✓

Benchmarks

#AA Intelligence indexArtificial Analysis style 45

#AG Agent fitAgent workflow 77

#RT Routing fitModel router 80

Industry blend Size: private · Updated: 2026-01 77%

Input: 1,000 · Output: 252 · 286 ms

Routing fit: 80
Agent fit: 77
JSON / tools: json · tools

Output

										
											{
  "model": "MiniMax M2.7",
  "provider": "MiniMax",
  "route": "chat",
  "modalities": [
    "Text",
    "Audio"
  ],
  "price_per_m": {
    "input": 0.3,
    "output": 1.2
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}

Try it out

Mistral Large

Mistral AI

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: private · Updated: 2025-11

Versions

Speed ⚡⚡⚡··

Intelligence ★★★★☆

Price (1M) $4.60

Inputs

Text

JSON Mode ✓

Function Calling ✓

Benchmarks

#AA Intelligence indexArtificial Analysis style 50

#AG Agent fitAgent workflow 82

#RT Routing fitModel router 85

Industry blend Size: private · Updated: 2025-11 82%

Input: 1,000 · Output: 174 · 414 ms

Routing fit: 85
Agent fit: 82
JSON / tools: json · tools

Output

										
											{
  "model": "Mistral Large",
  "provider": "Mistral AI",
  "route": "chat",
  "modalities": [
    "Text"
  ],
  "price_per_m": {
    "input": 2,
    "output": 6
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}

Try it out

Command A

Cohere

Search refinement and source ordering for living memo evidence stacks.

Size: private · Updated: 2025-10

Versions

Speed ⚡⚡···

Intelligence ★★★★☆

Price (1M) $7.38

Inputs

TextRerank

JSON Mode ✓

Function Calling ✓

Benchmarks

#AA Intelligence indexArtificial Analysis style 44

#AG Agent fitAgent workflow 80

#RT Routing fitModel router 89

Industry blend Size: private · Updated: 2025-10 80%

Input: 1,000 · Output: 114 · 632 ms

Routing fit: 89
Agent fit: 80
JSON / tools: json · tools

Output

										
											{
  "model": "Command A",
  "provider": "Cohere",
  "route": "rerank",
  "modalities": [
    "Text",
    "Rerank"
  ],
  "price_per_m": {
    "input": 2.5,
    "output": 10
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}

Try it out

Nomic Embed Text

Nomic

Retrieval and memory layer for source search, RAG, and evidence recall.

Size: 274MB · Updated: 2026-05

Versions

Speed ⚡⚡⚡⚡⚡

Intelligence ★★☆☆☆

Price (1M) $0.00

Inputs

Embeddings

JSON Mode ×

Function Calling ×

Benchmarks

#AA Intelligence indexArtificial Analysis style 22

#AG Agent fitAgent workflow 60

#RT Routing fitModel router 95

Industry blend Size: 274MB · Updated: 2026-05 60%

Input: 1,000 · Output: 1,080 · 120 ms

Routing fit: 95
Agent fit: 60
JSON / tools: freeform · no tools

Output

										
											{
  "model": "Nomic Embed Text",
  "provider": "Nomic / local",
  "route": "embeddings",
  "modalities": [
    "Embeddings"
  ],
  "price_per_m": {
    "input": 0,
    "output": 0
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}

Try it out

GPT Image 1

OpenAI

Image generation and editing lane for media outputs and visual assets.

Size: private · Updated: 2025-04

Versions

Speed ⚡····

Intelligence ★★★☆☆

Price (1M) $27.75

Inputs

ImageText

JSON Mode ×

Function Calling ×

Benchmarks

#AA Intelligence indexArtificial Analysis style 40

#AG Agent fitAgent workflow 58

#RT Routing fitModel router 76

Industry blend Size: private · Updated: 2025-04 58%

Input: 1,000 · Output: 128 · 0 ms

Routing fit: 76
Agent fit: 58
JSON / tools: freeform · no tools

Output

										
											{
  "model": "GPT Image 1",
  "provider": "OpenAI",
  "route": "image",
  "modalities": [
    "Image",
    "Text"
  ],
  "price_per_m": {
    "input": 5,
    "output": 40
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}

Try it out

Gemini Video

Google

Video-understanding lane for long-form visual streams and multimodal routing.

Size: private · Updated: 2026-02

Versions

Speed ⚡····

Intelligence ★★★★☆

Price (1M) $1.61

Inputs

VideoAudioText

JSON Mode ✓

Function Calling ✓

Benchmarks

#AA Intelligence indexArtificial Analysis style 43

#AG Agent fitAgent workflow 70

#RT Routing fitModel router 78

Industry blend Size: private · Updated: 2026-02 70%

Input: 1,000 · Output: 128 · 0 ms

Routing fit: 78
Agent fit: 70
JSON / tools: json · tools

Output

										
											{
  "model": "Gemini Video",
  "provider": "Google",
  "route": "video",
  "modalities": [
    "Video",
    "Audio",
    "Text"
  ],
  "price_per_m": {
    "input": 0.7,
    "output": 2.1
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}

Try it out

Whisper Large v3

OpenAI

Speech, audio, and voice processing lane for live media workflows.

Size: 1.55B · Updated: 2024-09

Versions

Speed ⚡····

Intelligence ★★★☆☆

Price (1M) $0.21

Inputs

AudioTranscribe

JSON Mode ✓

Function Calling ×

Benchmarks

#AA Intelligence indexArtificial Analysis style 34

#AG Agent fitAgent workflow 62

#RT Routing fitModel router 90

Industry blend Size: 1.55B · Updated: 2024-09 62%

Input: 1,000 · Output: 128 · 0 ms

Routing fit: 90
Agent fit: 62
JSON / tools: json · no tools

Output

										
											{
  "model": "Whisper Large v3",
  "provider": "OpenAI",
  "route": "audio",
  "modalities": [
    "Audio",
    "Transcribe"
  ],
  "price_per_m": {
    "input": 0.6,
    "output": 0
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}

Try it out

Omni Moderation

OpenAI

Safety and policy pre-checks before downstream model routing.

Size: private · Updated: 2025-12

Versions

Speed ⚡⚡⚡⚡⚡

Intelligence ★★★☆☆

Price (1M) $0.00

Inputs

TextImage

JSON Mode ✓

Function Calling ×

Benchmarks

#AA Intelligence indexArtificial Analysis style 30

#AG Agent fitAgent workflow 64

#RT Routing fitModel router 96

Industry blend Size: private · Updated: 2025-12 64%

Input: 1,000 · Output: 540 · 133 ms

Routing fit: 96
Agent fit: 64
JSON / tools: json · no tools

Output

										
											{
  "model": "Omni Moderation",
  "provider": "OpenAI",
  "route": "moderation",
  "modalities": [
    "Text",
    "Image"
  ],
  "price_per_m": {
    "input": 0,
    "output": 0
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}

Try it out

Catalog: github.com/marketplace/models · Marketplace → type=models

Intelligence vs cost (blend)

Speed strip

Agent fit radar

Routing matrix

Benchmark rows

Each block uses a split layout (copy + horizontal bar chart) similar to expandable benchmark sections on modern leaderboard sites.

Routing workflow

Classify each request, then send it to a specialized model lane: fast local summaries, vision, rerank, or deep reasoning. Inspired by Agent Recipes and Anthropic workflow patterns.

Parallelization workflow

Split a research prompt across search, transcript, ticker, and model-eval workers, then aggregate the results into one source-backed memo.

Evaluator-optimizer loop

One model drafts the brief, another critiques coverage and decides whether more searches are needed before the memo advances.

Serverless inference directory

Multi-sort table with major model brands, model size, last update, modality routing, and per-token price lanes.

Price per 1M tokens Batch API price


Claude Opus 4.6 Anthropic	chat Text · Image	private	2026-02 ctx 200K	$15.00	$75.00	96 JSON yes · Tools yes
Gemini 3.1 Pro Google	chat Text · Image · Audio · Video	private	2026-03 ctx 1M	$2.00	$12.00	93 JSON yes · Tools yes
GPT-5.4 (xhigh) OpenAI	chat Text · Image · Audio	private	2026-04 ctx 1M	$2.50	$15.00	95 JSON yes · Tools yes
DeepSeek V3.2 DeepSeek	chat Text	685B MoE	2026-01 ctx 130K	$0.28	$0.42	86 JSON yes · Tools yes
Grok 4.3 xAI	chat Text · Image	private	2026-03 ctx 131K	$3.00	$15.00	88 JSON yes · Tools yes
LiquidAI LFM2 Transcript Liquid AI / local GGUF	transcribe Text · Transcript	2.6B / 1.6GB Q4	2026-05 ctx 8K	$0.00	$0.00	68 JSON no · Tools no
gpt-oss-120B (high) OpenAI	chat Text	117B / MXFP4	2025-08 ctx 128K	$0.15	$0.60	83 JSON yes · Tools yes
Llama 4 Maverick Meta	vision Text · Image	128 experts	2025-04 ctx 1M	$0.27	$0.85	78 JSON no · Tools yes
Qwen3.6-Plus Alibaba / Qwen	chat Text · Image	private	2026-02 ctx 1M	$0.50	$3.00	85 JSON yes · Tools yes
Kimi K2.6 Moonshot AI	chat Text	private	2026-01 ctx 256K	$1.20	$4.50	82 JSON yes · Tools yes
GLM-5.1 Zhipu AI	chat Text · Image	private	2026-02 ctx 128K	$1.40	$4.40	81 JSON yes · Tools yes
MiniMax M2.7 MiniMax	chat Text · Audio	private	2026-01 ctx 1M	$0.30	$1.20	77 JSON yes · Tools yes
Mistral Large Mistral AI	chat Text	private	2025-11 ctx 128K	$2.00	$6.00	82 JSON yes · Tools yes
Command A Cohere	rerank Text · Rerank	private	2025-10 ctx 256K	$2.50	$10.00	80 JSON yes · Tools yes
Nomic Embed Text Nomic / local	embeddings Embeddings	274MB	2026-05 ctx 8K	$0.00	$0.00	60 JSON no · Tools no
GPT Image 1 OpenAI	image Image · Text	private	2025-04 ctx N/A	$5.00	$40.00	58 JSON no · Tools no
Gemini Video Google	video Video · Audio · Text	private	2026-02 ctx 1M	$0.70	$2.10	70 JSON yes · Tools yes
Whisper Large v3 OpenAI	audio Audio · Transcribe	1.55B	2024-09 ctx audio	$0.60	$0.00	62 JSON yes · Tools no
Omni Moderation OpenAI	moderation Text · Image	private	2025-12 ctx 128K	$0.00	$0.00	64 JSON yes · Tools no

External references: whichllm.together.ai · agentrecipes.com · building effective agents · onyx.app/llm-leaderboard · ← Directory home