•1 min read•from Machine Learning
Architecture advice: Real-time pipeline for YouTube Audio -> Whisper -> LLM -> SSE (Sub-10s latency) [D]
Hey everyone, I’m building a backend that analyzes long YouTube videos using an LLM.
Currently, my flow is a slow waterfall: Download full audio -> Whisper -> LLM -> Return results. For a 30-minute video, the user waits forever.
I want to pipeline this for real-time SSE streaming: [Chunk Audio on the fly] -> [Whisper] -> [LLM] -> [Stream to UI]
My questions for the data/backend engineers:
- Chunking & VAD: What's the best way to chunk YouTube audio streams (e.g., via ffmpeg) without cutting sentences in half and ruining the LLM's context?
- Queueing: Is standard
asyncioin FastAPI enough to handle these overlapping tasks, or do I strictly need Celery/Redis workers for this pipeline?
Any library recommendations or architectural patterns would be hugely appreciated
[link] [comments]
Want to read more?
Check out the full article on the original site
Tagged with
#generative AI for data analysis
#Excel alternatives for data analysis
#natural language processing for spreadsheets
#real-time data collaboration
#no-code spreadsheet solutions
#real-time collaboration
#rows.com
#big data management in spreadsheets
#conversational data analysis
#intelligent data visualization
#data visualization tools
#enterprise data management
#big data performance
#data analysis tools
#data cleaning solutions
#YouTube
#audio
#real-time
#pipeline
#Whisper