4 min readfrom VentureBeat

Pinterest cut AI costs 90% by gutting a frontier model's vision layer

Pinterest cut AI costs 90% by gutting a frontier model's vision layer

At 620 million monthly users, calling a frontier model for every image recommendation isn't a strategy — it's a bill. Pinterest CTO Matt Madrigal solved it by gutting Qwen3-VL's vision layer and rebuilding it with proprietary embeddings, cutting costs 90% and boosting accuracy 30%.

Madrigal’s team has been heavily investing in customizing open-source models “foundationally in-house.”

“If you've got really unique data that you can then fine-tune an open source model with, data quality will, frankly, outweigh or overcome model size,” Madrigal explained in a recent VB Beyond the Pilot podcast

How Pinterest customized Qwen for visual discovery

Pinterest, which has around 620 million monthly active users, has long applied open source models for visual search and discovery, going back to Google’s BERT and OpenAI’s CLIP. The company fine-tuned its own Pin CLIP on the latter, incorporating proprietary visual embeddings and image metadata. 

Pinterest’s conversational shopping assistant, Navigator 1, was built on Qwen3-VL and customized in “pretty significant” ways. Madrigal’s team essentially “ripped out” Qwen’s vision encoder layer and fine-tuned the model on proprietary multimodal embeddings. This has allowed them to capture metadata around pins and images that can then be precomputed offline and regularly retrained on new information to deliver personalized experiences. 

“Open-source models, especially with open Apache licenses where you can truly tweak a lot of open weights and customize for unique use cases — that's where we've found open source to be so powerful for us,” Madrigal said. 

Bringing their own embeddings allows his team to gain context around metadata, pins, and images; also, notably, the model performs better at runtime and inference. Without these embeddings, devs would have to call and encode each image returned at runtime, one at a time. That results in a latency “20 times worse” from an inference perspective, Madrigal said. 

“If it's something that's going to be critical for our end users, that's going to drive engagement, that will have to scale to over 600 million monthly active users, we're going to either probably build it or we're going to leverage open source and customize the heck out of it,” he said. 

How a taste graph captures evolving interests

To guide users from inspiration to purchase, Madrigal's team built a "taste graph": a dynamic representation of what individual users actually like, not just what they click on. “It's this representation of billions of people's evolving tastes,” he said. 

People go to Google or other search engines when they have a clear picture of what they want; Pinterest is for when they’re still in the discovery phase, Madrigal said. Pinterest’s goal is to encourage “lateral exploration” and transform discovery to intent (that is, clicking through ads or making purchases). 

Under the hood, the architecture combines a graph structure with representational learning. User embeddings capture a user’s evolving tastes. These are constantly updated based on activity and new content and signals. “It's not a social graph,” Madrigal said. “It's much more of a preference graph: What's going to inspire you? What are you trying to do next?” 

For instance, one user may be into mid-century modern designs; another may prefer a Nantucket aesthetic. Those preferences will be captured in user embeddings, and the taste graph will deliver up specific, relevant products as a result. 

“You go from the upper funnel, inspiration discovery, all the way through lower funnel intent,” Madrigal said. 

Listen to the full podcast to hear more about:

  • How Pinterest uses sandboxes to encourage creativity in a way that is secure and contained; 

  • Why a continuous feedback loop can prevent visual AI slop; 

  • The importance of constant benchmarking to gauge user engagement, performance, latency, and other factors. 

You can also listen and subscribe to Beyond the Pilot on Spotify, Apple or wherever you get your podcasts.

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#generative AI for data analysis
#Excel alternatives for data analysis
#natural language processing for spreadsheets
#financial modeling with spreadsheets
#conversational data analysis
#real-time data collaboration
#big data performance
#enterprise data management
#big data management in spreadsheets
#google sheets
#rows.com
#intelligent data visualization
#data visualization tools
#data analysis tools
#data cleaning solutions
#modern spreadsheet innovations
#machine learning in spreadsheet applications
#cloud-based spreadsheet applications
#real-time collaboration
#enterprise-level spreadsheet solutions