Models desk

Independent-style rankings and benchmark rows rendered statically on Train — not live vendor APIs or official benchmark scores.

Train models desk — static UI. For real side-by-side prompt evaluation in GitHub, enable GitHub Models on the repository (Compare tab, public preview).

LLM leaderboard

Compare models like leaderboard sites — on static GitHub Pages

Layout and chart idioms echo Together AI’s Which LLM cards, Onyx’s LLM leaderboard. Numbers below are illustrative placeholders for UI development; replace with your own dataset or API when wiring a real desk. GitHub-style model tester mock → · Liquid AI live tuning →

Model tester (Compare layout)

This block is a static visual echo of the GitHub Models Compare experience described in the May 2025 changelog (side-by-side runs, scores, token/latency readouts, JSON output). On GitHub you enable Models for your repository, version prompts (for example .prompt.yml), then evaluate against hosted models — not something this static Train site can call without your own API keys and a backend.

erika-market / Add categories to transaction prompt #6233

Use-case router

The best LLMs for your use case

WhichLLM-style picks rendered inside the compare grid: choose a provider version, check speed, intelligence, inputs, JSON/tool support, then jump into the matching tester or local Liquid AI loop.

Open Liquid AI tuning
1

Claude Opus 4.6

Anthropic

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: private · Updated: 2026-02

Speed ⚡····
Intelligence ★★★★★
Price (1M) $54.00
Inputs

TextImage

JSON Mode
Function Calling
Benchmarks

#AA Intelligence indexArtificial Analysis style 57

#AG Agent fitAgent workflow 96

#RT Routing fitModel router 92

Original main Size: private · Updated: 2025-05 90.00%

Input: 842 · Output: 186 · 920 ms

Correct categorization
0.90
Confidence calibration
0.82
JSON validation
pass
Output
										
											{
  "categories": [
    { "name": "Groceries", "confidence": 0.91 },
    { "name": "Transport", "confidence": 0.88 }
  ]
}
										
									
Try it out
2

Gemini 3.1 Pro

Google

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: private · Updated: 2026-03

Speed ⚡⚡···
Intelligence ★★★★★
Price (1M) $8.50
Inputs

TextImageAudioVideo

JSON Mode
Function Calling
Benchmarks

#AA Intelligence indexArtificial Analysis style 57

#AG Agent fitAgent workflow 93

#RT Routing fitModel router 91

Version 1 Size: MoE / Scout · Updated: 2025-04 50.55%

Input: 842 · Output: 204 · 1410 ms

Correct categorization
0.30
Confidence calibration
0.55
JSON validation
pass
Output
										
											{
  "categories": [
    { "name": "Culture", "confidence": 0.30 },
    { "name": "Transport", "confidence": 0.61 }
  ]
}
										
									
Try it out
3

GPT-5.4 (xhigh)

OpenAI

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: private · Updated: 2026-04

Speed ⚡⚡···
Intelligence ★★★★★
Price (1M) $10.63
Inputs

TextImageAudio

JSON Mode
Function Calling
Benchmarks

#AA Intelligence indexArtificial Analysis style 57

#AG Agent fitAgent workflow 95

#RT Routing fitModel router 94

Version 2 Size: private · Updated: 2025-10 77.60%

Input: 842 · Output: 178 · 680 ms

Correct categorization
0.72
Confidence calibration
0.79
JSON validation
pass
Output
										
											{
  "categories": [
    { "name": "Groceries", "confidence": 0.85 },
    { "name": "Transport", "confidence": 0.77 }
  ]
}
										
									
Try it out
4

DeepSeek V3.2

DeepSeek

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: 685B MoE · Updated: 2026-01

Speed ⚡····
Intelligence ★★★★☆
Price (1M) $0.37
Inputs

Text

JSON Mode
Function Calling
Benchmarks

#AA Intelligence indexArtificial Analysis style 52

#AG Agent fitAgent workflow 86

#RT Routing fitModel router 84

Industry blend Size: 685B MoE · Updated: 2026-01 86%

Input: 1,000 · Output: 77 · 938 ms

Routing fit
84
Agent fit
86
JSON / tools
json · tools
Output
										
											{
  "model": "DeepSeek V3.2",
  "provider": "DeepSeek",
  "route": "chat",
  "modalities": [
    "Text"
  ],
  "price_per_m": {
    "input": 0.28,
    "output": 0.42
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}
										
									
Try it out
5

Grok 4.3

xAI

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: private · Updated: 2026-03

Speed ⚡⚡⚡⚡·
Intelligence ★★★★☆
Price (1M) $10.80
Inputs

TextImage

JSON Mode
Function Calling
Benchmarks

#AA Intelligence indexArtificial Analysis style 53

#AG Agent fitAgent workflow 88

#RT Routing fitModel router 86

Industry blend Size: private · Updated: 2026-03 88%

Input: 1,000 · Output: 228 · 316 ms

Routing fit
86
Agent fit
88
JSON / tools
json · tools
Output
										
											{
  "model": "Grok 4.3",
  "provider": "xAI",
  "route": "chat",
  "modalities": [
    "Text",
    "Image"
  ],
  "price_per_m": {
    "input": 3,
    "output": 15
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}
										
									
Try it out
6

LiquidAI LFM2 Transcript

Liquid AI

Local/private transcript summarization and caption brief generation.

Size: 2.6B / 1.6GB Q4 · Updated: 2026-05

Speed ⚡····
Intelligence ★★★☆☆
Price (1M) $0.00
Inputs

TextTranscript

JSON Mode ×
Function Calling ×
Benchmarks

#AA Intelligence indexArtificial Analysis style 31

#AG Agent fitAgent workflow 68

#RT Routing fitModel router 79

Industry blend Size: 2.6B / 1.6GB Q4 · Updated: 2026-05 68%

Input: 1,000 · Output: 128 · 0 ms

Routing fit
79
Agent fit
68
JSON / tools
freeform · no tools
Output
										
											{
  "model": "LiquidAI LFM2 Transcript",
  "provider": "Liquid AI / local GGUF",
  "route": "transcribe",
  "modalities": [
    "Text",
    "Transcript"
  ],
  "price_per_m": {
    "input": 0,
    "output": 0
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}
										
									
Try it out
7

gpt-oss-120B (high)

OpenAI

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: 117B / MXFP4 · Updated: 2025-08

Speed ⚡⚡⚡⚡⚡
Intelligence ★★★☆☆
Price (1M) $0.44
Inputs

Text

JSON Mode
Function Calling
Benchmarks

#AA Intelligence indexArtificial Analysis style 33

#AG Agent fitAgent workflow 83

#RT Routing fitModel router 88

Industry blend Size: 117B / MXFP4 · Updated: 2025-08 83%

Input: 1,000 · Output: 274 · 263 ms

Routing fit
88
Agent fit
83
JSON / tools
json · tools
Output
										
											{
  "model": "gpt-oss-120B (high)",
  "provider": "OpenAI",
  "route": "chat",
  "modalities": [
    "Text"
  ],
  "price_per_m": {
    "input": 0.15,
    "output": 0.6
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}
										
									
Try it out
8

Llama 4 Maverick

Meta

Multimodal text and image understanding for visual research and review tasks.

Size: 128 experts · Updated: 2025-04

Speed ⚡⚡⚡··
Intelligence ★★★★☆
Price (1M) $0.65
Inputs

TextImage

JSON Mode ×
Function Calling
Benchmarks

#AA Intelligence indexArtificial Analysis style 49

#AG Agent fitAgent workflow 78

#RT Routing fitModel router 82

Industry blend Size: 128 experts · Updated: 2025-04 78%

Input: 1,000 · Output: 204 · 353 ms

Routing fit
82
Agent fit
78
JSON / tools
freeform · tools
Output
										
											{
  "model": "Llama 4 Maverick",
  "provider": "Meta",
  "route": "vision",
  "modalities": [
    "Text",
    "Image"
  ],
  "price_per_m": {
    "input": 0.27,
    "output": 0.85
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}
										
									
Try it out
9

Qwen3.6-Plus

Alibaba

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: private · Updated: 2026-02

Speed ⚡⚡⚡··
Intelligence ★★★★☆
Price (1M) $2.13
Inputs

TextImage

JSON Mode
Function Calling
Benchmarks

#AA Intelligence indexArtificial Analysis style 51

#AG Agent fitAgent workflow 85

#RT Routing fitModel router 87

Industry blend Size: private · Updated: 2026-02 85%

Input: 1,000 · Output: 186 · 387 ms

Routing fit
87
Agent fit
85
JSON / tools
json · tools
Output
										
											{
  "model": "Qwen3.6-Plus",
  "provider": "Alibaba / Qwen",
  "route": "chat",
  "modalities": [
    "Text",
    "Image"
  ],
  "price_per_m": {
    "input": 0.5,
    "output": 3
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}
										
									
Try it out
10

Kimi K2.6

Moonshot AI

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: private · Updated: 2026-01

Speed ⚡⚡···
Intelligence ★★★★☆
Price (1M) $3.35
Inputs

Text

JSON Mode
Function Calling
Benchmarks

#AA Intelligence indexArtificial Analysis style 50

#AG Agent fitAgent workflow 82

#RT Routing fitModel router 84

Industry blend Size: private · Updated: 2026-01 82%

Input: 1,000 · Output: 134 · 536 ms

Routing fit
84
Agent fit
82
JSON / tools
json · tools
Output
										
											{
  "model": "Kimi K2.6",
  "provider": "Moonshot AI",
  "route": "chat",
  "modalities": [
    "Text"
  ],
  "price_per_m": {
    "input": 1.2,
    "output": 4.5
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}
										
									
Try it out
11

GLM-5.1

Zhipu AI

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: private · Updated: 2026-02

Speed ⚡⚡⚡··
Intelligence ★★★★☆
Price (1M) $3.35
Inputs

TextImage

JSON Mode
Function Calling
Benchmarks

#AA Intelligence indexArtificial Analysis style 49

#AG Agent fitAgent workflow 81

#RT Routing fitModel router 83

Industry blend Size: private · Updated: 2026-02 81%

Input: 1,000 · Output: 151 · 476 ms

Routing fit
83
Agent fit
81
JSON / tools
json · tools
Output
										
											{
  "model": "GLM-5.1",
  "provider": "Zhipu AI",
  "route": "chat",
  "modalities": [
    "Text",
    "Image"
  ],
  "price_per_m": {
    "input": 1.4,
    "output": 4.4
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}
										
									
Try it out
12

MiniMax M2.7

MiniMax

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: private · Updated: 2026-01

Speed ⚡⚡⚡⚡·
Intelligence ★★★★☆
Price (1M) $0.89
Inputs

TextAudio

JSON Mode
Function Calling
Benchmarks

#AA Intelligence indexArtificial Analysis style 45

#AG Agent fitAgent workflow 77

#RT Routing fitModel router 80

Industry blend Size: private · Updated: 2026-01 77%

Input: 1,000 · Output: 252 · 286 ms

Routing fit
80
Agent fit
77
JSON / tools
json · tools
Output
										
											{
  "model": "MiniMax M2.7",
  "provider": "MiniMax",
  "route": "chat",
  "modalities": [
    "Text",
    "Audio"
  ],
  "price_per_m": {
    "input": 0.3,
    "output": 1.2
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}
										
									
Try it out
13

Mistral Large

Mistral AI

General chat, reasoning, writing, coding, and agent orchestration workloads.

Size: private · Updated: 2025-11

Speed ⚡⚡⚡··
Intelligence ★★★★☆
Price (1M) $4.60
Inputs

Text

JSON Mode
Function Calling
Benchmarks

#AA Intelligence indexArtificial Analysis style 50

#AG Agent fitAgent workflow 82

#RT Routing fitModel router 85

Industry blend Size: private · Updated: 2025-11 82%

Input: 1,000 · Output: 174 · 414 ms

Routing fit
85
Agent fit
82
JSON / tools
json · tools
Output
										
											{
  "model": "Mistral Large",
  "provider": "Mistral AI",
  "route": "chat",
  "modalities": [
    "Text"
  ],
  "price_per_m": {
    "input": 2,
    "output": 6
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}
										
									
Try it out
14

Command A

Cohere

Search refinement and source ordering for living memo evidence stacks.

Size: private · Updated: 2025-10

Speed ⚡⚡···
Intelligence ★★★★☆
Price (1M) $7.38
Inputs

TextRerank

JSON Mode
Function Calling
Benchmarks

#AA Intelligence indexArtificial Analysis style 44

#AG Agent fitAgent workflow 80

#RT Routing fitModel router 89

Industry blend Size: private · Updated: 2025-10 80%

Input: 1,000 · Output: 114 · 632 ms

Routing fit
89
Agent fit
80
JSON / tools
json · tools
Output
										
											{
  "model": "Command A",
  "provider": "Cohere",
  "route": "rerank",
  "modalities": [
    "Text",
    "Rerank"
  ],
  "price_per_m": {
    "input": 2.5,
    "output": 10
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}
										
									
Try it out
15

Nomic Embed Text

Nomic

Retrieval and memory layer for source search, RAG, and evidence recall.

Size: 274MB · Updated: 2026-05

Speed ⚡⚡⚡⚡⚡
Intelligence ★★☆☆☆
Price (1M) $0.00
Inputs

Embeddings

JSON Mode ×
Function Calling ×
Benchmarks

#AA Intelligence indexArtificial Analysis style 22

#AG Agent fitAgent workflow 60

#RT Routing fitModel router 95

Industry blend Size: 274MB · Updated: 2026-05 60%

Input: 1,000 · Output: 1,080 · 120 ms

Routing fit
95
Agent fit
60
JSON / tools
freeform · no tools
Output
										
											{
  "model": "Nomic Embed Text",
  "provider": "Nomic / local",
  "route": "embeddings",
  "modalities": [
    "Embeddings"
  ],
  "price_per_m": {
    "input": 0,
    "output": 0
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}
										
									
Try it out
16

GPT Image 1

OpenAI

Image generation and editing lane for media outputs and visual assets.

Size: private · Updated: 2025-04

Speed ⚡····
Intelligence ★★★☆☆
Price (1M) $27.75
Inputs

ImageText

JSON Mode ×
Function Calling ×
Benchmarks

#AA Intelligence indexArtificial Analysis style 40

#AG Agent fitAgent workflow 58

#RT Routing fitModel router 76

Industry blend Size: private · Updated: 2025-04 58%

Input: 1,000 · Output: 128 · 0 ms

Routing fit
76
Agent fit
58
JSON / tools
freeform · no tools
Output
										
											{
  "model": "GPT Image 1",
  "provider": "OpenAI",
  "route": "image",
  "modalities": [
    "Image",
    "Text"
  ],
  "price_per_m": {
    "input": 5,
    "output": 40
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}
										
									
Try it out
17

Gemini Video

Google

Video-understanding lane for long-form visual streams and multimodal routing.

Size: private · Updated: 2026-02

Speed ⚡····
Intelligence ★★★★☆
Price (1M) $1.61
Inputs

VideoAudioText

JSON Mode
Function Calling
Benchmarks

#AA Intelligence indexArtificial Analysis style 43

#AG Agent fitAgent workflow 70

#RT Routing fitModel router 78

Industry blend Size: private · Updated: 2026-02 70%

Input: 1,000 · Output: 128 · 0 ms

Routing fit
78
Agent fit
70
JSON / tools
json · tools
Output
										
											{
  "model": "Gemini Video",
  "provider": "Google",
  "route": "video",
  "modalities": [
    "Video",
    "Audio",
    "Text"
  ],
  "price_per_m": {
    "input": 0.7,
    "output": 2.1
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}
										
									
Try it out
18

Whisper Large v3

OpenAI

Speech, audio, and voice processing lane for live media workflows.

Size: 1.55B · Updated: 2024-09

Speed ⚡····
Intelligence ★★★☆☆
Price (1M) $0.21
Inputs

AudioTranscribe

JSON Mode
Function Calling ×
Benchmarks

#AA Intelligence indexArtificial Analysis style 34

#AG Agent fitAgent workflow 62

#RT Routing fitModel router 90

Industry blend Size: 1.55B · Updated: 2024-09 62%

Input: 1,000 · Output: 128 · 0 ms

Routing fit
90
Agent fit
62
JSON / tools
json · no tools
Output
										
											{
  "model": "Whisper Large v3",
  "provider": "OpenAI",
  "route": "audio",
  "modalities": [
    "Audio",
    "Transcribe"
  ],
  "price_per_m": {
    "input": 0.6,
    "output": 0
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}
										
									
Try it out
19

Omni Moderation

OpenAI

Safety and policy pre-checks before downstream model routing.

Size: private · Updated: 2025-12

Speed ⚡⚡⚡⚡⚡
Intelligence ★★★☆☆
Price (1M) $0.00
Inputs

TextImage

JSON Mode
Function Calling ×
Benchmarks

#AA Intelligence indexArtificial Analysis style 30

#AG Agent fitAgent workflow 64

#RT Routing fitModel router 96

Industry blend Size: private · Updated: 2025-12 64%

Input: 1,000 · Output: 540 · 133 ms

Routing fit
96
Agent fit
64
JSON / tools
json · no tools
Output
										
											{
  "model": "Omni Moderation",
  "provider": "OpenAI",
  "route": "moderation",
  "modalities": [
    "Text",
    "Image"
  ],
  "price_per_m": {
    "input": 0,
    "output": 0
  },
  "source_blend": [
    "Arena",
    "Onyx",
    "Vellum",
    "LLM Stats",
    "Artificial Analysis"
  ]
}
										
									
Try it out

Catalog: github.com/marketplace/models · Marketplace → type=models

Benchmark rows

Each block uses a split layout (copy + horizontal bar chart) similar to expandable benchmark sections on modern leaderboard sites.

Routing workflow

Classify each request, then send it to a specialized model lane: fast local summaries, vision, rerank, or deep reasoning. Inspired by Agent Recipes and Anthropic workflow patterns.

Parallelization workflow

Split a research prompt across search, transcript, ticker, and model-eval workers, then aggregate the results into one source-backed memo.

Evaluator-optimizer loop

One model drafts the brief, another critiques coverage and decides whether more searches are needed before the memo advances.

Serverless inference directory

Multi-sort table with major model brands, model size, last update, modality routing, and per-token price lanes.

Price per 1M tokens Batch API price
Claude Opus 4.6 Anthropic chat Text · Image private 2026-02 ctx 200K $15.00 $75.00 96 JSON yes · Tools yes
Gemini 3.1 Pro Google chat Text · Image · Audio · Video private 2026-03 ctx 1M $2.00 $12.00 93 JSON yes · Tools yes
GPT-5.4 (xhigh) OpenAI chat Text · Image · Audio private 2026-04 ctx 1M $2.50 $15.00 95 JSON yes · Tools yes
DeepSeek V3.2 DeepSeek chat Text 685B MoE 2026-01 ctx 130K $0.28 $0.42 86 JSON yes · Tools yes
Grok 4.3 xAI chat Text · Image private 2026-03 ctx 131K $3.00 $15.00 88 JSON yes · Tools yes
LiquidAI LFM2 Transcript Liquid AI / local GGUF transcribe Text · Transcript 2.6B / 1.6GB Q4 2026-05 ctx 8K $0.00 $0.00 68 JSON no · Tools no
gpt-oss-120B (high) OpenAI chat Text 117B / MXFP4 2025-08 ctx 128K $0.15 $0.60 83 JSON yes · Tools yes
Llama 4 Maverick Meta vision Text · Image 128 experts 2025-04 ctx 1M $0.27 $0.85 78 JSON no · Tools yes
Qwen3.6-Plus Alibaba / Qwen chat Text · Image private 2026-02 ctx 1M $0.50 $3.00 85 JSON yes · Tools yes
Kimi K2.6 Moonshot AI chat Text private 2026-01 ctx 256K $1.20 $4.50 82 JSON yes · Tools yes
GLM-5.1 Zhipu AI chat Text · Image private 2026-02 ctx 128K $1.40 $4.40 81 JSON yes · Tools yes
MiniMax M2.7 MiniMax chat Text · Audio private 2026-01 ctx 1M $0.30 $1.20 77 JSON yes · Tools yes
Mistral Large Mistral AI chat Text private 2025-11 ctx 128K $2.00 $6.00 82 JSON yes · Tools yes
Command A Cohere rerank Text · Rerank private 2025-10 ctx 256K $2.50 $10.00 80 JSON yes · Tools yes
Nomic Embed Text Nomic / local embeddings Embeddings 274MB 2026-05 ctx 8K $0.00 $0.00 60 JSON no · Tools no
GPT Image 1 OpenAI image Image · Text private 2025-04 ctx N/A $5.00 $40.00 58 JSON no · Tools no
Gemini Video Google video Video · Audio · Text private 2026-02 ctx 1M $0.70 $2.10 70 JSON yes · Tools yes
Whisper Large v3 OpenAI audio Audio · Transcribe 1.55B 2024-09 ctx audio $0.60 $0.00 62 JSON yes · Tools no
Omni Moderation OpenAI moderation Text · Image private 2025-12 ctx 128K $0.00 $0.00 64 JSON yes · Tools no

External references: whichllm.together.ai · agentrecipes.com · building effective agents · onyx.app/llm-leaderboard · ← Directory home