2 min readfrom Machine Learning

I compiled LLM inference pricing across 7 providers — the caching numbers are surprising(spreadsheet included) [R]

I compiled LLM inference pricing across 7 providers — the caching numbers are surprising(spreadsheet included) [R]
I compiled LLM inference pricing across 7 providers — the caching numbers are surprising(spreadsheet included) [R]

I've been comparing GPU/LLM providers for a side project and ended up with way too many browser tabs and spreadsheets.

So I decided to pull the public pricing data into one sheet and compare it side by side.

A quick disclaimer: this is not benchmark data. I didn't run latency tests or throughput measurements. Everything comes from public pricing pages and APIs (OpenRouter, DeepSeek, Together AI, Fireworks, Groq, etc.).

The spreadsheet currently tracks:

  • Input/output token pricing
  • Context windows
  • Cached input pricing (where available)
  • Supported models
  • Provider-specific pricing differences

The thing that surprised me most was caching.

For example, when looking at DeepSeek V4 Pro pricing across providers, cached input costs vary dramatically. In some cases a cache hit is tens of times cheaper than a cache miss.

That made me realize that if you're running:

  • Agents with large system prompts
  • RAG pipelines with reusable context
  • Multi-turn conversations
  • Repeated prompt templates

...the "headline" token price can be a lot less important than the caching policy.

A few other interesting things I noticed:

  • The same model can vary by multiple times in cost depending on provider.
  • Some providers expose caching clearly, while others barely document it.
  • Model availability and context windows aren't always consistent across providers.
  • It's surprisingly hard to find all of this information in one place.

A few things I haven't figured out how to compare yet:

  • Real throughput (tokens/sec)
  • Cold-start / queue times
  • Whether providers are serving FP16, FP8, quantized variants, etc.
  • Egress/network costs
  • Reliability/uptime

I'm curious how others evaluate providers.

When you're choosing between OpenRouter, Together, Fireworks, Groq, DeepSeek, etc., what metrics actually matter to you beyond token pricing?

https://preview.redd.it/4vj50mvhu79h1.png?width=1615&format=png&auto=webp&s=6c6c084927f83bfdadb5ed8e4378f520a1da6766

Am I missing any important data points that should be included in a v2?

submitted by /u/Technomadlyf
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#generative AI for data analysis
#Excel alternatives for data analysis
#big data management in spreadsheets
#real-time data collaboration
#financial modeling with spreadsheets
#natural language processing for spreadsheets
#conversational data analysis
#intelligent data visualization
#data visualization tools
#enterprise data management
#big data performance
#data analysis tools
#data cleaning solutions
#modern spreadsheet innovations
#machine learning in spreadsheet applications
#enterprise-level spreadsheet solutions
#digital transformation in spreadsheet software
#collaborative spreadsheet tools
#AI-driven spreadsheet solutions
#cloud-based spreadsheet applications