I compiled LLM inference pricing across 7 providers — the caching numbers are surprising(spreadsheet included) [R]
![I compiled LLM inference pricing across 7 providers — the caching numbers are surprising(spreadsheet included) [R]](/_next/image?url=https%3A%2F%2Fpreview.redd.it%2F4vj50mvhu79h1.png%3Fwidth%3D140%26height%3D63%26auto%3Dwebp%26s%3Df53b566e7aa9a25215aa77fcf3ed0b16e426e2a1&w=3840&q=75)
| I've been comparing GPU/LLM providers for a side project and ended up with way too many browser tabs and spreadsheets. So I decided to pull the public pricing data into one sheet and compare it side by side. A quick disclaimer: this is not benchmark data. I didn't run latency tests or throughput measurements. Everything comes from public pricing pages and APIs (OpenRouter, DeepSeek, Together AI, Fireworks, Groq, etc.). The spreadsheet currently tracks:
The thing that surprised me most was caching. For example, when looking at DeepSeek V4 Pro pricing across providers, cached input costs vary dramatically. In some cases a cache hit is tens of times cheaper than a cache miss. That made me realize that if you're running:
...the "headline" token price can be a lot less important than the caching policy. A few other interesting things I noticed:
A few things I haven't figured out how to compare yet:
I'm curious how others evaluate providers. When you're choosing between OpenRouter, Together, Fireworks, Groq, DeepSeek, etc., what metrics actually matter to you beyond token pricing? Am I missing any important data points that should be included in a v2? [link] [comments] |
Want to read more?
Check out the full article on the original site