Live Voice Analysis

Live Waveform — 48 kHz input signal LPC-12 ANALYSIS ACTIVE

Vowel Targeting Matrix — F1 / F2 acoustic space

canonical

live position

correction vector

Each circle is a canonical American English vowel (Hillenbrand et al., Journal of the Acoustical Society of America, 1995). The cyan dot is your live F1/F2 position. The lime arrow shows the direction and fraction of the correction Goeckoh's engine applies — moving formants toward the nearest canonical vowel target.

Acoustic Metrics — live readout

—

F0 Pitch

—

F1 Height

—

F2 Front/Back

—

F3 Rhotics

—

HNR Quality

—

dBFS

RMS Level

Frequency Spectrum — formant markers

Corollary Discharge Mechanism

The 200 ms Prediction Window

Every time your motor cortex issues a speech command, it simultaneously sends an efference copy — a prediction of what the resulting sound should sound like — to your auditory cortex. The auditory cortex has roughly 200 milliseconds to compare what it actually hears against that prediction. If the heard sound arrives within this window and matches the target, the motor program is reinforced. If the prediction fails or the signal arrives late, the learning loop does not close.

In ASD and dysarthria, the child produces a distorted vowel, the brain predicts a distorted vowel, the sounds match — and the broken pattern is reinforced every time. Goeckoh breaks this cycle by delivering a corrected signal within the window.

0 ms

Motor command — you intend to speak

<50 ms

Goeckoh delivers corrected signal to earbuds*

~200 ms

Prediction window closes — brain finalises motor learning

250 ms

Learning window closed — no motor update possible

* <50 ms refers to the native on-device processing latency of Goeckoh's DSP engine (Rust-compiled, running directly on the device's audio hardware). This demo shows analysis only; the correction is delivered through the full application running natively. Total perceived latency also includes Bluetooth earbud round-trip, which varies by hardware (typically 40–150 ms). Even at the upper range, correction arrives well within the 200 ms prediction window.

N1 Suppression: When the correction arrives on time, the auditory cortex suppresses the N1 evoked potential — an EEG marker indicating the brain accepted the corrected voice as self-produced. This is the measurable neural signature of the corollary discharge loop closing correctly. (Niziolek, Nagarajan, Houde, Journal of Neuroscience, 2013.)

The Neuroscience Behind the Correction

The Inner Dialogue Problem

Before you say a single word, your brain has already heard it.

The moment your motor cortex fires the command to speak, it simultaneously sends a copy of that command — called an efference copy — to your auditory cortex. Your auditory cortex uses that copy to generate a precise prediction of what the incoming sound should be. This predicted signal is what neuroscientists believe underlies your inner voice — the voice you "hear" in your head when you think in words.

When real sound arrives 100–200 ms later, the brain compares it against the prediction. If they match: the sound is tagged as self-produced and the comparison signal (the N1 wave) is suppressed — the brain essentially says "I expected this, move on." If they don't match: the comparison fires hard, generating a motor-learning error signal that drives your articulators to self-correct.

This is the loop Goeckoh targets. In children with ASD, dysarthria, or apraxia, their distorted output is accurately predicted — because the brain has learned to expect the distorted version. N1 is suppressed. No error signal fires. No learning happens. The child practices the wrong pattern thousands of times.

Goeckoh delivers a corrected signal within the brain's own comparison window — creating a deliberate, therapeutic mismatch. The brain hears what the word should sound like at the exact moment it is comparing prediction to reality. This forces the motor learning loop open again.

The corollary discharge loop — step by step

Motor cortex fires

You intend to say "cup." Motor cortex issues a movement sequence to your lips, tongue, jaw, and larynx — the precise articulatory commands for that word.

Efference copy dispatched

Simultaneously, a copy of the motor command — the efference copy — is forwarded to the cerebellum and then to auditory cortex. This is the same signal that generates your inner voice.

Auditory cortex generates prediction

Using the efference copy, auditory cortex generates a forward model: a prediction of exactly what acoustic signal should arrive when the articulators execute those commands.

Real sound arrives (~100–200 ms)

The actual speech signal arrives at the cochlea. Auditory cortex receives it and subtracts the prediction from the incoming signal.

── comparison window ──

Match → N1 suppressed (healthy speech)

Prediction minus reality ≈ 0. The auditory N1 ERP is suppressed. The brain labels the sound "self-generated, expected" and discards it. The word is produced normally.

Mismatch → N1 fires → motor learning (Goeckoh's target)

Prediction minus reality ≠ 0. N1 fires strongly. The cerebellum generates a correction signal. Motor cortex updates its forward model for the next attempt. This is how humans learn to speak correctly — and what is broken in dysarthria.

The 200 ms window: Goeckoh's entire correction pipeline — LPC-12 analysis, Bark-space formant mapping, Hillenbrand target interpolation, and LPC resynthesis — runs in under 180 ms on the device. This is not arbitrary latency tolerance. It is specifically engineered to deliver the corrected signal inside the auditory-motor comparison window, so the brain receives the corrected token as if it were its own natural output.

⬡

N1 Suppression — The Neural Fingerprint

The N1 is an EEG event-related potential (ERP) component that fires ~100 ms after a sound reaches the ear. In neurotypical speakers, N1 is suppressed by ~50% for self-produced speech — the brain has already accounted for it via the efference copy.

In many neurodiverse profiles this suppression is attenuated or absent, disrupting the brain's ability to evaluate its own output. But the more clinically significant problem is the inverse: in children who have practiced distorted speech patterns for years, N1 is suppressed — the brain accurately predicts the distortion and stops treating it as an error.

Goeckoh creates a controlled, targeted mismatch. N1 fires. The error signal is real. Motor learning resumes.

Niziolek, Nagarajan & Houde, J. Neurosci. 2013 · PNAS 2024

◎

DIVA Model — Two Feedback Loops

The Directions Into Velocities of Articulators (DIVA) model (Guenther, Boston University) describes speech motor control as two interacting systems:

Feedforward: Motor programs stored in premotor cortex that execute familiar words from memory, without monitoring.

Feedback: Auditory and somatosensory loops that continuously compare output against targets and issue corrections when they diverge.

In dysarthria, feedforward programs are malformed (stored wrong). The feedback loop cannot override them fast enough. Goeckoh operates entirely within the feedback integration window — it doesn't change the feedforward program directly; it changes what the feedback loop hears, forcing the system to re-derive the correct feedforward program through repeated practice.

Guenther et al., Neural Computation 2006 · Tourville & Guenther, Lang. Cogn. Proc. 2011

Live Acoustic Measurements — what each signal means

♩

F0 — Fundamental Frequency (Pitch)

— Hz

The rate at which your vocal folds open and snap shut. Each full open-and-close cycle is one period of the glottal wave. Adult male folds close ~116× per second; adult female ~205×; a child's ~223×. That repetition rate is what you hear as pitch.

Goeckoh measures F0 by autocorrelation in the 80–500 Hz range and preserves it exactly. The correction never moves pitch — your speaker identity and emotional tone live in F0, and Goeckoh does not touch them.

Adult male: 93–135 Hz · Adult female: 162–238 Hz · Child: ~223 Hz

↕

F1 — First Formant (Tongue Height)

— Hz

The lowest resonance peak of your vocal tract. F1 is shaped by jaw openness and tongue height. Drop your jaw — F1 rises. Raise your tongue toward the palate — F1 falls. It's the acoustic signature of how open or closed your mouth is.

In ASD and dysarthria, F1 frequently undershoots because jaw excursion is reduced — articulatory undershoot. The vowel space collapses toward center. F1 correction is the single highest-impact intervention Goeckoh makes.

Typical range across vowels: 300–900 Hz

↔

F2 — Second Formant (Tongue Advancement)

— Hz

F2 encodes tongue front/back position. Push your tongue forward and F2 rises — this is how front vowels like /i/ get their "bright" quality. Pull it back and F2 falls — the rounded /u/ of "boot."

In dysarthria, back vowels are the hardest: both lip-rounding and tongue retraction reduce simultaneously, collapsing F2 toward front-vowel territory. The acoustic contrast between front and back vowels disappears. This is measurable in milliseconds and correctable in real time.

Typical range across vowels: 800–2500 Hz

〰

HNR — Harmonics-to-Noise Ratio

— dB

HNR is the ratio of periodic harmonic energy (clean vocal-fold vibration) to aperiodic turbulent noise (breathiness, roughness). A high HNR means your vocal folds are closing cleanly on every cycle. A low HNR indicates irregular contact — often a sign of vocal fatigue, pathology, or high emotional-stress state.

Goeckoh's CrystallineHeart engine monitors HNR as part of neurological state estimation — sustained low HNR is one marker for OVERLOAD state, which gates correction intensity down to prevent sensory fatigue.

Healthy adult on sustained vowel: ≥11 dB · Pathological threshold: ≤7.4 dB

Ready to hear the difference?

What you just ran in your browser is a window into the same DSP engine running natively in Goeckoh — at 10× the resolution, with real-time formant correction delivered to the ear.

Get Goeckoh — $20/month Full application · Android, iOS, macOS, Windows, Linux

See what Goeckoh hears.

The 200 ms Prediction Window

Ready to hear the difference?