Indo-European languages
Over 3.4 billion people, about 42 percent of the world's population, speak an Indo-European language as their first tongue. That makes Indo-European by far the largest language family on Earth. English, Spanish, Russian, Hindi, Bengali, Persian, and German all trace back to a single ancestor that no one ever wrote down. Linguists call it Proto-Indo-European, and they believe it was spoken sometime around 3300 BC, during the Neolithic or early Bronze Age. The strange thing is that this parent language left no records of its own. It has been reconstructed entirely from its descendants, like a building reassembled from its echoes. So how did scholars first suspect that Sanskrit, Greek, and Latin were cousins? Where did this lost mother tongue come from, and how can anyone claim to know its grammar? And why did one family of languages end up spoken on every inhabited continent?
In 1583, an English Jesuit named Thomas Stephens wrote a letter from Goa to his brother. In it he noticed that the languages of North India resembled Greek and Latin. The letter was not published until the 20th century, so the observation went nowhere. Two years later, a Florentine merchant named Filippo Sassetti, born in 1540, made a similar note while traveling in the Indian subcontinent. Writing in 1585, he listed word pairs between Sanskrit and Italian: sapta and sette for seven, aṣṭa and otto for eight, nava and nove for nine. Neither man's curiosity sparked any wider scholarly inquiry. In 1647, the Dutch scholar Marcus Zuerius van Boxhorn proposed that certain Asian and European languages descended from a primitive common language he called Scythian. He folded Dutch, Greek, Latin, Persian, and German into the idea, later adding Slavic, Celtic, and Baltic. His suggestion, too, failed to take root. The pattern repeated across the next century, with the Ottoman traveler Evliya Çelebi noticing German and Persian likenesses in Vienna around 1665, and Gaston Coeurdoux comparing Sanskrit, Latin, and Greek conjugations in the late 1760s.
Sir William Jones lectured to the Asiatic Society of Bengal in 1786, and his words became one of the most famous quotations in all of linguistics. He examined the three oldest languages known in his time, Latin, Greek, and Sanskrit, and tentatively added Gothic, Celtic, and Persian. Then he declared that Sanskrit bore such a strong affinity to Greek and Latin, in both verb roots and grammar, that no scholar could study the three without believing they had "sprung from some common source, which, perhaps, no longer exists." Jones never named that source. The naming came later. Thomas Young coined the term "Indo-European" in 1813, drawing it from the family's geographic extremes, from Western Europe to North India. A rival label, Indo-Germanic, had appeared in French as indo-germanique in 1810 in the work of Conrad Malte-Brun. In German the term indogermanisch remains the standard scientific word to this day.
Franz Bopp published a study in 1816 comparing the conjugational system of Sanskrit with Greek, Latin, Persian, and Germanic. Between 1833 and 1852 he followed it with his Comparative Grammar, and that work marks the formal beginning of Indo-European studies as an academic field. The classical phase ran from Bopp through August Schleicher's Compendium of 1861 to Karl Brugmann's Grundriss, published in the 1880s. Ferdinand de Saussure made a daring move in 1879. He proposed invisible elements he called coefficients sonantiques to explain odd vowel-length alternations across the family. For decades it was pure theory. Then in 1927, Jerzy Kuryłowicz identified a Hittite consonant, written ḫ, that sat exactly where Saussure had predicted one of his hidden elements should be. That confirmation became the laryngeal theory, a turning point that helped open the modern era of the discipline. Later specialists such as Calvert Watkins, Jochem Schindler, and Helmut Rix deepened the understanding of word formation and of ablaut, the patterned vowel changes that run through the whole family.
Proto-Indo-European is an inflected language, meaning it signaled the relationships between words through endings rather than word order. Its roots are basic units of meaning. Add a suffix and you get a stem; add an ending and you get a fully inflected noun or verb. The reconstructed verb system is intricate, and like the noun it shows ablaut. The sound system is just as striking. PIE is normally rebuilt with 15 stop consonants and an unusual three-way voicing contrast among voiceless, voiced, and breathy-voiced or "voiced aspirated" stops. Stranger still, it had voiced aspirated stops with no matching voiceless aspirated series, a pattern that typologists consider extremely rare. No daughter language kept this system intact. In Germanic and Armenian, all three series shifted in a chain, so that bh, b, and p became b, p, and f. In Germanic that shift carries the name Grimm's law. The Indo-Aryan languages took the opposite path and added a whole fourth series of voiceless aspirated consonants.
The word for "hundred" splits the entire family in two. In Avestan the initial sound became a fricative, while in Latin the same sound became an ordinary velar, a hard k as in centum. Peter von Bradke formalized this divide in 1890, naming the two camps satem and centum, though Karl Brugmann had floated a similar split in 1886. The satem languages include the Balto-Slavic and Indo-Iranian branches, along with Albanian and Armenian in most respects. In them the reconstructed palatovelar sounds turned into sibilants while the labiovelars merged into plain velars. The centum languages did the reverse. Scholars no longer read this as a clean family tree fork. The boundary cuts across many other dialect features, suggesting the changes spread geographically across a continuum rather than marking one ancient split. It may even be that the centum branches preserve the original PIE state, and only the satem branches shared a wave of innovations. Frederik Kortlandt proposes that the ancestors of the Balts and Slavs took part in that satemization before being pulled later into the western Indo-European sphere.
Hittite holds the record for the oldest written Indo-European. The Anitta text, composed in Hittite and dated to 1700 BC, is the oldest known text in any Indo-European language. Older still are isolated Hittite and Luwian words scattered through Old Assyrian texts from the 20th and 19th centuries BC, written otherwise in the unrelated Semitic language Akkadian. The ten traditional branches stretch across both space and time. Mycenaean Greek survives in fragments from between 1450 and 1350 BC, and Homer's poems date to the 8th century BC. Indo-Iranian is attested around 1400 BC; its Indo-Aryan side preserves the Rigveda, passed down orally from roughly the mid-2nd millennium BC before ever being written. Other branches surface far later. Albanian is attested from the 13th century, Armenian from the early 5th century AD, and Baltic only from the 14th century, yet Lithuanian and Latvian retain features so archaic they remain vital to reconstruction. Two branches died out entirely. Anatolian vanished by Late Antiquity, and Tocharian, recorded in two dialects between roughly the 6th and 9th centuries AD, was pushed aside by the Old Turkic Uyghur Khaganate and was probably extinct by the 10th century.
The verb meaning "to bear," reconstructed as bʰer-, still echoes across the family, from Sanskrit bʰárati to Greek phérō to Latin ferō to Gothic baíris. That single thread of inheritance hints at how widely these peoples moved. By the start of the Common Era, Indo-European peoples controlled almost all of the western two-thirds of Eurasia: Celts in western and central Europe, Romans in the south, Germanic peoples in the north, Slavs in the east, Iranian peoples across western and central Asia, and Indo-Aryans in the subcontinent. Tocharians held the eastern frontier in western China. Pockets of older languages survived, such as Basque, a pre-Indo-European isolate, and the Uralic tongues Hungarian, Finnish, and Estonian. The second great surge came with colonialism and the Age of Discovery. Romance, West Germanic, and Russian carried the family to every habitable continent. Today ten of the world's twenty most-spoken languages are Indo-European, each with 100 million speakers or more, and around 600 million people study English alone. The family's recorded history is the second-longest of any known, trailing only the Egyptian and Semitic branches of Afroasiatic.
Up Next
Common questions
How many people speak an Indo-European language?
Over 3.4 billion people, about 42 percent of the global population, speak an Indo-European language as a first language. This is by far the most of any language family. Ethnologue counts about 446 living Indo-European languages, of which 313 belong to the Indo-Iranian branch.
What is the Proto-Indo-European language?
Proto-Indo-European is the reconstructed common ancestor of all Indo-European languages, spoken sometime during the Neolithic or early Bronze Age around 3300 BC. It left no written records and has been rebuilt entirely from its descendant languages. It was an inflected language with a complex system of 15 stop consonants.
Where did the Proto-Indo-European homeland come from?
The academic consensus supports the Kurgan hypothesis, which places the Proto-Indo-European homeland on the Pontic-Caspian steppe in what is now Ukraine and Southern Russia. It is associated with the Yamnaya culture and related archaeological cultures during the 4th and early 3rd millennia BC.
Who first proposed that Indo-European languages were related?
Sir William Jones lectured to the Asiatic Society of Bengal in 1786, arguing that Latin, Greek, and Sanskrit had sprung from a common source that perhaps no longer exists. Earlier observers included Thomas Stephens in 1583 and Filippo Sassetti in 1585, but their notes did not lead to further inquiry. Marcus Zuerius van Boxhorn proposed a shared ancestor he called Scythian in 1647.
What is the oldest written Indo-European language?
Hittite is the earliest recorded Indo-European language. The Anitta text, written in Hittite and dated to 1700 BC, is the oldest known text in any Indo-European language. Even older are isolated Hittite and Luwian words found in Old Assyrian texts from the 20th and 19th centuries BC.
What are the branches of the Indo-European language family?
The Indo-European family has ten major branches: Albanian, Anatolian, Armenian, Balto-Slavic, Celtic, Germanic, Hellenic, Indo-Iranian, Italic, and Tocharian. Anatolian and Tocharian are extinct, while the others contain living languages. All descend from Proto-Indo-European.
What is the difference between centum and satem languages?
The centum-satem division splits Indo-European based on how the word for hundred developed, and it was named by Peter von Bradke in 1890. In satem languages, including Balto-Slavic and Indo-Iranian, the palatovelar sounds became sibilants. In centum languages such as Latin they merged into plain velars, giving the hard k of centum.