EEGSpeech

14 June 2025·Also on Medium

For people living with ALS or locked-in syndrome, the gap between thinking a word and speaking it can be permanent. EEGSpeech is a brain-computer interface I built to narrow that gap. It decodes which speech sound a person is imagining, not speaking aloud, from their brain activity recorded through EEG electrodes on the scalp.

EEG signals are noisy and low-resolution compared to invasive recordings. Band-pass filtering isolates the mu and beta bands between 8 and 30 Hz, the frequency ranges most associated with speech imagery. Artefact rejection strips out trials contaminated by eye blinks or jaw movement. What remains is a multichannel time series for each trial: a few seconds of someone imagining a specific phoneme.

The model is a CNN-LSTM hybrid. Convolutional layers learn which electrode combinations carry discriminative information for each phoneme. LSTM layers capture the temporal dynamics as the imagined sound unfolds. Speech imagery has both spatial structure, certain brain regions contribute more, and temporal structure, the signal evolves over the utterance. The architecture needs to handle both.

Training on small, class-imbalanced EEG datasets required stratified cross-validation, time-shift augmentation, noise injection, and a weighted loss function. The final model reached 92.67% accuracy on phoneme classification. That number comes with caveats: controlled lab conditions, a limited phoneme set, and coached participants. But for non-invasive EEG, it is competitive.

I applied Grad-CAM to make the model interpretable. The visualisations highlight which time windows and electrode channels drove each prediction. Motor cortex and Broca’s area showed the strongest contributions, which aligns with their known role in speech planning. A vision-language model generates clinical summaries of each prediction in plain language, so a clinician does not need to read raw model outputs.

The system runs locally in Docker with a Streamlit frontend. A researcher can load an EEG recording, run the full pipeline, and see predictions with explanations within seconds. Patient data stays on-premises. The technology is not ready for everyday clinical use, but each increment in accuracy opens the door a little wider for people who need it.