Big Data and the Reconstruction of Linguistic Prehistory

Big Data and the Reconstruction of Linguistic Prehistory

Judith Kaplan


"The Preliminary Genealogical Tree for Eurasia, Long Variant" from the Global Lexicostatistical Database

In 2001, an international and interdisciplinary team of researchers joined forces to establish the Evolution of Human Languages (EHL) project at the Santa Fe Institute. Their goal was to sustain long-range research on linguistic prehistory through the development of a comprehensive online database. Described as a "sort of Human Genome Project for historical linguistics," founders hoped that the EHL would eventually house data on every human language ever attested (ca. 6,000), and that this collection would facilitate both comparative analysis and quantitative modeling of linguistic diversity.

Members of the EHL were not afraid to think "big." Not only was the database itself ambitious in scope, it was motivated by a desire to survey vast geographic territory and to illuminate remote historical periods, at least 10,000 years old. With a team of 40 scholars from four different continents, the EHL was also sizable in terms of its social and political organization. Its explanatory aims, further, were near-universal: founders hoped to see "an absolute majority of the world's languages ... reduced to a minimum number of huge language superfamilies." 

Proceeding from this example of data-driven research, this project sought to provide a number of contextual frameworks (intellectual, material, social, and political) for understanding the role of Big Data in several different "quest[s] for the mother tongue." Though it is tempting to view the EHL as a novel approach to the production of historical linguistic knowledge, it argued it has methodological and epistemic roots that reach back into the nineteenth century. Judith Kaplan examined these roots through case studies on lexicostatistics, Nostratic linguistics, and more recent interdisciplinary projects. She ultimately analyzed critical reactions to this work from more mainstream sectors to highlight cultural and conceptual investments in the "comparative method" of historical linguistics.