Published May 4, 2020

Episode 408: Mike McCourt on Voice and Speech Analysis

Delve into the complexities of voice and speech analysis with Mike McCourt as he unpacks the challenges of selecting and applying machine learning models to call and voice data, addressing theme recognition, conversation variability, and accent intricacies critical for enhancing recognition accuracy.
Episode Highlights
Software Engineering Radio - the podcast for professional software developers logo

Popular Clips

Episode Highlights

  • Model Selection

    Choosing the right machine learning model for voice data is a nuanced process. emphasizes starting with the simplest model possible to establish a baseline, which can then be iterated upon to address specific challenges 1. He shares an example from his experience analyzing the Federalist Papers to determine authorship, where he began with a basic statistical model and gradually refined it 1.

    My conclusion was that Madison had written almost all of the disputed ones.

    ---

    This approach highlights the importance of simplicity and adaptability in model selection.

       

    Data Sensitivity

    Audio quality and call length present significant challenges in voice data analysis. explains that varying call qualities, from crystal clear to garbled, require models that can handle diverse audio inputs 2. Additionally, call length affects model sensitivity; shorter calls may not provide enough data, while longer calls can introduce noise 3.

    If your model is really sensitive to patterns, so that even in a really noisy transcript with bad audio quality, it can still pick out the relevant patterns.

    ---

    Finding a balance in sensitivity is crucial for effective analysis.

       

    Supervised vs Unsupervised

    The complexity of phone call analysis often requires a blend of supervised and unsupervised learning techniques. describes how businesses provide examples of calls to train models, but the diversity in language and personal expression makes it challenging to rely solely on supervised learning 4. By combining both approaches, they can identify common themes across calls while accounting for individual variations 5.

    We use a combination of unsupervised learning and a supervised algorithm.

    ---

    This hybrid method allows for more nuanced and accurate call analysis.

Related Episodes