724: Decoding Speech from Raw Brain Activity — with Dr. David Moses

Topics covered
Popular Clips
Episode Highlights
ML Models
The BRAVO project employs sophisticated machine learning models to decode speech from brain activity. explains that the process begins with a silent speech model, where participants attempt to silently articulate sentences, providing training data that includes brain activity without acoustics 1. This data is then processed using language model techniques to convert brain signals into sentences. Moses highlights the use of lexical constraints and language models to generate and rescore potential sentences, leveraging existing natural language processing technologies 2.
One of the beauties of our approach is that it's the same training data that we can use to train all three models.
---
These models are trained on servers and run in real-time alongside participants, showcasing the integration of advanced machine learning with neuroscience.
Speech Synthesis
Speech synthesis in the BRAVO project involves translating brain activity into spoken words through a series of complex processes. describes how brain signals are mapped to discrete units, akin to phonemes, which are then used to generate speech waveforms 3. This involves using models like Hubert units to process acoustics and reconstruct speech sounds, bypassing traditional language models due to the acoustic nature of the task 4.
It's brain activity to these special units that are kind of a compressed representation of speech sound.
---
The process culminates in synthesizing speech in a personalized voice, demonstrating a significant advancement in speech neuroprosthetics.
Real-Time
The real-time implementation of the BRAVO system involves multiple machine learning models operating simultaneously to decode speech. outlines how separate models are used for text, speech sounds, and avatar animation, each trained on similar structures but targeting different outputs 5. This approach allows for the concurrent generation of text, speech, and visual outputs, enhancing the communication capabilities of paralyzed patients.
We did train for this three separate machine learning models.
---
The system achieves impressive results, such as predicting 75 words per minute with 75% accuracy, showcasing the potential of integrating neural networks with real-time applications 6.
Related Episodes


696: Brain-Computer Interfaces and Neural Decoding — with Prof. Bob Knight
Answers 383 questions

829: Neuroscience Fueled by ML — with Prof. Bradley Voytek
Answers 383 questions
SDS 496: 2040: A Brain-Computer Interface Story — with Jon Krohn
Answers 383 questions
SDS 620: OpenAI Whisper: General-Purpose Speech Recognition — with @JonKrohnLearns
Answers 383 questions

725: Neuroscience + Machine Learning — with Google DeepMind's Dr. Kim Stachenfeld
Answers 383 questions

677: Digital Analytics — with Avinash Kaushik
Answers 383 questions

SDS 589: Narrative A.I. — with Hilary Mason
Answers 383 questions

759: Full Encoder-Decoder Transformers Fully Explained — with Kirill Eremenko
Answers 383 questions

SDS 489: Monetizing Machine Learning — with Vin Vashishta
Answers 383 questions

SDS 439: Deep Learning for Machine Vision — with Deblina Bhattacharjee
Answers 383 questions

667: Harnessing GPT-4 for your Commercial Advantage — with Vin Vashishta
Answers 383 questions

SDS 543: Sparking A.I. Innovation — with Nicole Büttner
Answers 383 questions

SDS 513: Transformers for Natural Language Processing — with Denis Rothman
Answers 383 questions

770: The Neuroscientific Guide to Confidence — with Lucy Antrobus
Answers 383 questions

838: Consciousness and Machines — with Jennifer K. Hill
Answers 383 questions













