Published Apr 14, 2017

From Particle Physics to Audio AI with Scott Stephenson - #19

Join Scott Stephenson as he explores his fascinating journey from particle physics to pioneering audio AI at Deepgram, unveiling groundbreaking technologies in audio indexing and neural network search, and shedding light on Kur, a community-driven framework simplifying deep learning model development.
Episode Highlights
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) logo

Popular Clips

Episode Highlights

  • Neural Indexing

    Deepgram's innovative approach to audio processing involves indexing activations deep within neural networks rather than relying on text. explains that this method allows for more accurate searches by focusing on phoneme-like structures defined by data rather than human-assigned phonemes 1. This approach enhances the network's ability to recognize speech patterns and improves accuracy by dynamically adjusting to the data 2.

    You're indexing, you're building an index out of activations deep in a neural network.

    ---

    The use of declarative neural networks, facilitated by the Kur framework, further supports this process by allowing users to define models in a flexible and accessible manner 3.

       

    Audio Search

    Deepgram revolutionizes audio search by achieving high accuracy in identifying relevant audio segments. shares that their technology can find desired audio content with up to 90% accuracy, a significant improvement over traditional methods 4. This capability is particularly useful for applications like podcast indexing, where users can quickly locate specific topics or mentions within vast audio libraries 5.

    We went from very poor accuracy, meaning, like, maybe 20% of the time you'll find what you're looking for to 80 or 90% of the time finding what you're looking for.

    ---

    By treating audio spectrograms as images, Deepgram's system can efficiently process and search through large audio datasets, offering a transformative experience for users seeking specific information.

       

    Deep Speech

    Deepgram's deep speech models, inspired by Baidu's Deep Speech, are applied across various tasks, including fraud detection and quality assurance. notes that these models use convolutional and recurrent layers to process audio data, similar to the architecture of Deep Speech networks 6. This approach allows businesses to analyze vast amounts of audio data for patterns, such as identifying fraudulent calls in financial services 6.

    The models that we use to build our indexes and to ingest audio are extremely similar to the deep speech networks.

    ---

    Additionally, the open-source Kur framework supports these applications by providing a flexible platform for developing and deploying neural networks 7.

Related Episodes