Diving deep into the human gene pool, a 100-strong team of researchers two years ago built a library of more than 10 million variants in exomes.
These are a small part of the human genome, accounting for no more than two per cent of DNA, but are crucially important.
Exomes consist of the coding portions of genes – the stretches of DNA that express proteins, the basic building blocks of the human body and its functions.
Flaws can have a cascade effect, leading to disease.
A study published in the journal Nature is the first analysis of the database – and confirms a rich potential for pinpointing inherited causes of disease.
The new resource “is invaluable,” said senior author Daniel MacArthur, co-director of medical and population genetics at the Broad Institute of MIT and Harvard.
“It gives us the ability to discover rare variants and offers an unparalleled window into the roots of rare genetic diseases.”
Most genetic variations – we each have tens of thousands – are benign.
But without a near-complete library of the possible permutations of our DNA, it is very hard for scientists, or doctors treating patients, to pick out the harmful ones and link them to specific conditions.
The Exome Aggregation Consortium (ExAC) database, compiled from dozens of previous studies, seeks to fill this gap.
It includes detailed profiles of the protein-coding genes from more than 60,000 people.
“The goal was to create a dataset that could be used as a reference for the variation present in the general population,” MacArthur told AFP.
“Physicians can look up a genetic variation found in their patients and understand how common it is across the general population.”
The more common it is, the less likely it will be the cause of a serious condition.
Deep end of the gene pool
Made available online in 2014, the catalogue has been consulted more than five million times, becoming a “standard reference” for diagnosing patients with rare diseases, MacArthur said.
Most of the exome mutations uncovered in the trawl have been identified for the first time, and some are extremely rare, even unique.
The findings apply in particular to so-called “Mendelian” diseases, caused by a single gene.
Well-known examples include cystic fibrosis, which inflicts severe damage on the lungs and the digestive system; Pfeiffer syndrome, characterised by a severe deformation of skull bones; and Smith-Lemli-Opitz syndrome, a disorder linked to multiple malformations and intellectual disability.
At the same time, the researchers found that nearly 200 variants previously fingered as the cause of severe disorders appeared way too frequently to be a culprit.
“We show that they must actually be harmless variations that have wound up in databases through error,” MacArthur said.
The study also revealed that the same mutation can happen spontaneously to two or more people.
Previously, it was assumed that when identical variants are found in more than one individual, it could be traced back to a common ancestor.
Not only is the ExAC dataset 10 times bigger than previous efforts, it also is a broader reflection of human diversity.
Most large-scale samplings of human genomes have focused on people of European origin or – a distant second – African Americans.
But East and South Asians, along with Latino populations, are well represented here.
Still missing, however, are individuals from the Middle East, and most parts of the African continent.
“The current work highlights the pace at which human genetics is scaling up,” commented Jay Shendure of the University of Washington, co-author of a similar 2008 study covering only 12 genomes – hailed as a breakthrough at the time.
“In the coming decade, the number of human genomes that will be sequenced in some manner will grow to at least tens of millions,” he wrote in a commentary, also in Nature.