Speech
Speech is the use of the human voice as a medium for language, and it sits at the heart of what separates humans from every other species. No monkey or ape uses its tongue for vocal communication the way humans do. No animal vocalization is organized both phonemically and syntactically. That gap raises urgent questions. How did this capacity evolve when the fossil record refuses to cooperate? How does a thought become a spoken word in fractions of a second without any conscious effort? What happens in the brain when the system breaks down? And what does talking to yourself in an empty room actually do for the mind?
Monkeys, non-human apes, and humans have all evolved specialized mechanisms for producing social vocalizations. The decisive difference is the tongue. No non-human primate uses its tongue for communication, and the human species' unprecedented use of the tongue alongside the lips and other moveable articulators places speech in a category scholars find genuinely difficult to explain.
Determining the timeline of this development is made harder still by the fossil record. The human vocal tract does not fossilize, and indirect evidence drawn from hominid fossils has proven inconclusive. Trained apes such as Washoe and Kanzi have demonstrated the ability to use simple sign language, but even their most sophisticated outputs lack the phonemic and syntactic organization that defines speech. Several species have developed communication systems that superficially resemble language, yet they consistently fall short on grammar, syntax, recursion, and displacement. The evolutionary emergence of speech therefore remains, in the framing researchers in the field have adopted, an intriguing theoretical challenge.
Speech production is an unconscious multi-step process. The mind selects words from the lexicon, applies the correct morphological forms, organizes them through syntax, retrieves their phonetic properties, and then coordinates the physical articulations needed to produce sound. All of this happens without deliberate attention to any single step.
Articulatory phonetics, the branch of linguistics that studies this final stage, classifies every speech sound by two coordinates: place of articulation and manner of articulation. Place refers to where in the mouth or throat the airstream is constricted. Manner covers how tightly air is restricted, what form of airstream is used, whether the vocal cords vibrate, and whether the nasal cavity is open. Normal human speech is pulmonic, driven by pressure from the lungs, which creates phonation in the glottis before the vocal tract and mouth shape it into distinct vowels and consonants. Humans can also produce alaryngeal speech, bypassing the lungs and glottis entirely. Three types exist: esophageal speech, pharyngeal speech, and buccal speech, the last of which is better known as Donald Duck talk.
English-speaking children frequently say 'singed' instead of 'sang,' applying the regular -ed past tense suffix to an irregular verb. That single error type carries significant theoretical weight. It shows that regular forms are acquired earlier in development than irregular ones, and researchers use it as evidence about the architecture of the lexicon itself.
Errors associated with aphasia have been equally instructive. Patients with expressive aphasia struggle to produce regular past-tense verb forms while handling irregulars like 'sing-sang' with comparatively less difficulty. Researchers interpret this pattern as evidence that regularly inflected words are not stored as whole units. Instead, they are assembled in real time by attaching a suffix to a base form. Irregular forms, by contrast, appear to be stored whole. The pattern of what breaks and what survives in damaged brains has become one of the most productive windows into how the intact brain organizes language.
Hebrew speakers, who distinguish the voiced consonant /b/ from the voiceless /p/, can more easily detect a change in voice onset time from -10 milliseconds (heard as /b/) to 0 milliseconds (heard as /p/) than an equally large shift from +10 to +20 milliseconds, even though both changes span the same distance on the acoustic scale. This phenomenon captures something fundamental about speech perception. Listeners do not hear sounds as a continuous spectrum. They sort them into discrete categories and are far more sensitive to differences that cross a category boundary than to differences within the same category.
This research connects directly to practical applications. Understanding how listeners categorize speech sounds has informed the design of computer speech recognition systems and guided efforts to improve recognition tools for people with hearing loss or language impairments, where the gap between acoustic input and linguistic understanding can be especially wide.
Paul Broca identified a region of the inferior prefrontal cortex in 1861 after two of his patients, damaged there, proved unable to speak beyond a few monosyllabic words. The condition now called Broca's or expressive aphasia is characterized by speech that is slow and labored, stripped of function words, and severely impaired in syntax. Comprehension is relatively preserved except for grammatically complex sentences.
Carl Wernicke, writing in 1874, proposed a different connection: damage to the posterior left superior temporal gyrus produced a contrasting syndrome. Wernicke's aphasia leaves prosody and syntax largely intact but devastates lexical access, producing fluent but nonsensical or jargon-filled speech with poor comprehension. The classical model linking these two regions describes a signal traveling from the auditory cortex to Wernicke's area for lexical processing, passing via the arcuate fasciculus to Broca's area for morphology and syntax, and then moving to the motor cortex for articulation.
Modern research has complicated this picture. Damage to the left lateral sulcus impairs morphology and syntax while leaving lexical access and comprehension of irregular forms intact, a finding that falls outside the classical model. The circuits involved also adapt dynamically: processing becomes more efficient when listeners encounter familiar material such as learned verses, pointing to a language network that is plastic rather than fixed.
Diseases of the lungs and vocal cords, from respiratory infections and vocal fold nodules to cancers of the throat, can interrupt the pulmonic machinery that drives normal speech. Brain disorders including alogia, dysarthria, and dystonia each disrupt different stages of the production chain. Hearing conditions such as otitis media with effusion and auditory processing disorders interfere with the perceptual feedback that supports speech development and maintenance.
Psychiatric conditions leave measurable acoustic traces too. Fundamental frequency, which listeners perceive as pitch, tends to be significantly lower in major depressive disorder than in healthy controls. That finding has prompted researchers to investigate speech as a potential biomarker for mental health disorders more broadly. Speech-language pathologists, also called SLPs, assess these conditions, make diagnoses, and design treatment across the full range of causes. A 1995 study by Masur found that how often young children repeat novel words, as opposed to words already in their lexicon, predicts the size of their vocabulary later in development, suggesting that the habit of speech repetition itself actively shapes the lexicon a person carries into adulthood.
Continue Browsing
Common questions
When did Paul Broca identify the brain region responsible for speech deficits?
Paul Broca identified a specific brain region in 1861 that caused severe deficits when damaged. Two patients unable to speak beyond monosyllabic words led to the discovery of expressive aphasia.
What is the timeline for normal human children developing first words and phrases?
First words appear during the first year of life for typical development. Children progress through two or three word phrases before reaching three years of age, with short sentences emerging by four years old according to standard developmental milestones.
How does the human vocal tract produce normal speech sounds?
Normal human speech is pulmonic, produced with pressure from the lungs. This airflow creates phonation within the glottis located inside the larynx before being modified by the mouth and vocal tract.
Why is determining the timeline of speech evolution difficult for researchers?
The human vocal tract does not fossilize, making this absence of physical evidence a persistent challenge for researchers. Indirect evidence regarding hominid fossils has proven inconclusive when trying to map changes in the vocal tract over time.
Which disorders affect speech production due to lung or vocal cord issues?
Diseases affecting lungs or vocal cords include paralysis, bronchitis, and cancers of the throat. Disorders of the brain such as dysarthria and dystonia lead to poor speech production alongside hearing problems like otitis media with effusion.