3 min readfrom Machine Learning

Live Human Detector on Outbound Phone Calls [R]

Goal
To save humans wasting time sitting in Call Centre queues waiting to be answered

To have tool listen in on the audio stream of a live call, post IVR Navigation - to determine whether the call has transitioned out of the queue and to a live person.

Requirements

The tool must be able to classify the audio within a sub 1-2 seconds contextual window with as high confidence level as possible.

This is not a typical AMD tool, we are not just detecting machine audio vs human speech

Assumed Challenges

  1. It may be difficult to determine between a pre-recorded RVA (Recorded Voice Announcement) and a human speaking. RVA typically are professionally recorded with distinct pitches and emotional queues, have clean audio with no background noise or silence before and after the message. This is not always the case, especially if announcements are recorded in house by the general staff.
  2. When a call is transitioning and 'Answered' there is usually a distinct soft click and or some background noise before the agent starts speaking. This silence period, whilst a good indication a call has been answered could be confused with quiet periods between music or RVA announcements in the queue.
  3. It may be difficult to determine if we have been answered by Voicemail - whilst there is usually a beep at the end, the message itself would also start with a silence period followed by audio sounding similar to an RVA.
  4. A single short beep tone could mean Voicemail, Answered or it could mean the call is being recorded
  5. Identifying we are in a queue based on TTS audio may be difficult to identify as TTS engines become more sophisticated
  6. Telephony or G711a is in the frequency band of 300–3400 Hz @ 8000hz - 64 kbit/s

Approach

To train via machine leaning using labelled data, an audio classification application that analyses the acoustics, wav form or spectrograph (via Fast Fourier Transform) of the audio stream

At this stage I do not want to use STT to determine the phase or label - Although this will likely be added at a later stage as an additional layer in the pipline to increase confidence in some of these labels such as RVA/TTS/Voicemail/Call Screening

Phase

Queuing

Labels

Music, TTS, RVA (Recorded Voice Announcement)

Transitioning

Labels

Ringback, Answered, Machine Beep

Connected

Labels

Human, Fax, Voicemail, Call Screening

Disconnected

Labels

Engaged Tone

References

https://www.mdpi.com/2076-3417/12/7/3293 - YOHO You only here once
https://www.vicidial.org/VICIDIALforum/viewtopic.php?t=42330

https://huggingface.co/learn/audio-course/chapter2/audio_classification_pipeline

https://www.youtube.com/watch?v=m3XbqfIij_Y&t=32s

https://google-ai-edge.github.io/mediapipe-samples-web/#/audio/audio_classifier

https://scikit-learn.org/stable/machine_learning_map.html

https://arxiv.org/pdf/2410.08235

Question

Seeking assisance on where to actually start. Yes I be relying heavily on claude code to build this so apologies in advance

What is the best framework / algo rhythm / approach to start solving this problem. I have seen existing frameworks like YamNet work well and fast on classifying audio - however other suggest Whisper and ASR

What is the best way of tagging or labelling data. Do I label existing full length recordings with stop/start timestamps or each label or do I need to split each label into its own file - resulting in a loss of context.

Are there obvious existing data sets I should be using for some of my labels

submitted by /u/Bucky102
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#financial modeling with spreadsheets
#rows.com
#generative AI for data analysis
#Excel alternatives for data analysis
#real-time data collaboration
#big data management in spreadsheets
#machine learning in spreadsheet applications
#conversational data analysis
#intelligent data visualization
#data visualization tools
#enterprise data management
#big data performance
#data analysis tools
#data cleaning solutions
#google sheets
#natural language processing for spreadsheets
#enterprise-level spreadsheet solutions
#cloud-based spreadsheet applications
#no-code spreadsheet solutions
#real-time collaboration