Technology helps trace the expansion of Indo-European languages and preserve those of indigenous people | AGÊNCIA FAPESP

Technology helps trace the expansion of Indo-European languages and preserve those of indigenous people Linguists show how computation and the internet can be used to understand human migrations and preserve endangered languages (Paul Heggarty of the Max Planck Institute for the Science of Human History / photo: Felipe Maeda / Agência FAPESP)

Technology helps trace the expansion of Indo-European languages and preserve those of indigenous people

January 02, 2019

By André Julião  |  Agência FAPESP – Genetics and linguistics have combined to produce an enhanced understanding of how the Indo-European languages spread across Europe. This feat was achieved by Paul Heggarty and his group at Germany’s Max Planck Institute for the Science of Human History in a computer-aided analysis of DNA from ancient populations and data on languages spoken today.

One of their findings is a new estimate for the start of the Indo-European linguistic expansion, now thought to have begun 8,200 years ago.

“There are two main hypotheses, which propose different timelines [for the expansion of the first speakers of the language that gave rise to the Indo-European family]. One starts approximately 6,000 years ago and the other 8,500 or more. Our analysis showed that approximately 8,200 years ago is the best possible estimate now and that 6,000 years ago would be a little too recent,” Heggarty said in a presentation delivered on November 29, 2018, to the FAPESP-Max Planck Frontiers of Science Symposium.

Despite the use of computation to map large databases against each other, Heggarty stressed the importance of skilled human labor to obtain the best linguistic data, which can only then be interlaced with genetic data comparing the DNA of people alive today with that of their prehistoric ancestors.

“The computational analysis is based primarily on linguistics, which entails training people who work on these languages to understand them together with the data about them. Then, you have to convert the data into a format that the computational analysis can use. You can’t start with computers. You have to start with linguistics,” Heggarty told Agência FAPESP.

Fieldwork is indeed the main focus for another linguist, Filomena Sandalo, a professor at the University of Campinas’s Language Studies Institute (IEL-UNICAMP) in Brazil and principal investigator for the Thematic Project Edges and asymmetries in phonology and morphology funded by FAPESP. Sandalo is developing an online database of narratives and sound files with morphological and syntactic annotations on Brazil’s indigenous languages (available at

Sandalo gave a talk to the symposium on her fieldwork with the Pirahã, an indigenous community in the Amazon basin, using experimental psychology to investigate a theory proposed by US linguist Daniel Everett, according to which the Pirahã language lacks indirect recursion.

“Everett says Pirahã lacks the resources to make subordinate or relative clauses, or indeed any kind of subordination. So according to this hypothesis, it’s impossible to say in Pirahã, for example, ‘the cup is on the saucer that is on the table’. Our experiment shows this is just as possible as it is in Portuguese,” Sandalo said.

“They have a particle that marks coordination, whereas we mark subordination with a particle. In coordination, they use píai, equivalent to também [also] in Portuguese. This particle didn’t occur when we asked for subordinate clauses. In Pirahã, the coordinate construction would be ‘cup on saucer also on table’. There are no particles in subordinates. So we do have a contrast, but it’s just a different way of speaking. Their cognitive capacity is the same as ours, which isn’t surprising.”

Genes, languages and carbon 14

Heggarty explained that to study the expansion of cultures through the languages spoken today, it is necessary to construct family trees to trace phylogenetically how, when and where language families and their speakers expanded through prehistory.

“The differences among languages increase as they evolve over time, so the levels of difference [between languages] can be used to think about the time during which they diverged and from there infer their prehistory,” he said.

This information can be combined with data from DNA samples of ancient human remains and artifacts found at archeological sites.

“These combinations can enable us to see how people migrated from one place to another, because they speak related languages even though they live thousands of miles apart,” Heggarty said. “The ancient DNA gives rise to a particular genetic profile that moves from one part of the world to another.”

DNA does not explain everything, however. Suffice it to recall that languages spread via cultural domination, which is not necessarily genetic. “Speakers of Indo-European languages have very different genetic profiles. In the modern world, there are several cases of languages spreading and being learned. For example, English is one of the most widely spoken languages in India today. Genetically speaking, Indians aren’t European, but many of them speak this European language,” Heggarty said.

“Similarly, in South America. Brazilians have a wide range of ethnic origins, but the official language is Portuguese. So languages can spread culturally.”

Born British and living in Germany, Heggarty offered himself as a living example of this process. “My surname is Celtic, but I speak a Germanic language. This is because three generations ago, my great-grandparents stopped speaking Irish and began speaking English. I’m a case of a divergence between my Germanic linguistic lineage, which is English, and my Celtic linguistic lineage.”

More information:




Agência FAPESP licenses news reports under Creative Commons license CC-BY-NC-ND so that they can be republished free of charge and in a straightforward manner by other digital media or by print media. The name of the author or reporter (when applied) must be cited, as must the source (Agência FAPESP). Using the button HTML below ensures compliance with the rules described in Agência FAPESP’s Digital Content Republication Policy.

Topics most popular