Why some models cost more: the hidden reasoning tax
Share of generated tokens that are invisible "thinking" vs. the visible answer. You pay for both. The number on the right is reasoning tokens spent per answer token.
Hidden reasoning tokens (billed, never shown) Visible answer tokens
Token usage over one day of identical benchmark workload. "Reasoning per answer" = reasoning tokens ÷ visible output tokens. Qwen 3.5 Flash generates ~8.8 hidden tokens for every token you see — the reason it costs roughly 10× DeepSeek per task despite a lower per-token price. Reasoning effort is usually a model default, not a setting we chose; it is part of the model+harness combo and a first-class cost axis in InsureBench.