1 min readfrom Machine Learning

Architecture advice: Real-time pipeline for YouTube Audio -> Whisper -> LLM -> SSE (Sub-10s latency) [D]

Hey everyone, I’m building a backend that analyzes long YouTube videos using an LLM.

Currently, my flow is a slow waterfall: Download full audio -> Whisper -> LLM -> Return results. For a 30-minute video, the user waits forever.

I want to pipeline this for real-time SSE streaming: [Chunk Audio on the fly] -> [Whisper] -> [LLM] -> [Stream to UI]

My questions for the data/backend engineers:

  1. Chunking & VAD: What's the best way to chunk YouTube audio streams (e.g., via ffmpeg) without cutting sentences in half and ruining the LLM's context?
  2. Queueing: Is standard asyncio in FastAPI enough to handle these overlapping tasks, or do I strictly need Celery/Redis workers for this pipeline?

Any library recommendations or architectural patterns would be hugely appreciated

submitted by /u/Sea_Lawfulness_5602
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#generative AI for data analysis
#Excel alternatives for data analysis
#natural language processing for spreadsheets
#real-time data collaboration
#no-code spreadsheet solutions
#real-time collaboration
#rows.com
#big data management in spreadsheets
#conversational data analysis
#intelligent data visualization
#data visualization tools
#enterprise data management
#big data performance
#data analysis tools
#data cleaning solutions
#YouTube
#audio
#real-time
#pipeline
#Whisper