When human genome sequencing was announced 20 years ago by the Human Genome Project and biotech company Celera Genomics, the sequence was still incomplete. About 15 percent were missing: The limitations of technology at the time made it impossible to trace how certain sequenced segments of DNA fit together. This was especially true for regions in which many letters – or base pairs & nbsp, – are repeated. Over time, scientists solved some of the puzzles. The most recent human genome examined, which geneticists have been using as a reference since 2013, is still missing eight percent of the complete sequence.
Now biologists from the Telomere-to-Telomere Consortium (T2T), an international collaboration that includes about 30 institutions, have closed these gaps. This is the report of genome researcher Karen Miga of the University of California at Santa Cruz and her colleagues.. In the remainder that have now been sequenced, they have also discovered 115 new genes that code for proteins: a total of 19,969 genes are now known.
Solving these problems is exciting, and calls the results an “important milestone,” says Kim Pruitt, a bioinformatician at the US National Center for Biotechnology Information in Bethesda, Maryland. The newly sequenced genome — called T2T-CHM13. The 2013 version of the human genome sequence adds about 200 million base pairs.
new sequencing technology
This time, instead of using DNA from a living individual, the researchers used genetic material from a cell line derived entirely from tissue known as the hydatidiform mole. Those mola hydatidosa Can arise in the placenta, for example when a sperm fertilizes an egg cell without a nucleus. In addition, the paternal genetic makeup is then doubled, so that the resulting cells contain two chromosomes from the father, instead of the normal one made from the sperm and egg cell. For genetic researchers, such cells have the advantage that they do not have to differentiate between sets of chromosomes from different people.
For the current work, a new sequencing technology from Pacific Biosciences in Menlo Park was used, without which the investigation probably wouldn’t have been possible, Miga says. Lasers are used to scan long sections of DNA isolated from cells: up to 20,000 base pairs at a time. Traditional sequencing processes read DNA into sections of only a few hundred base pairs, which researchers have to put back together like pieces of a puzzle. Larger pieces are much easier to piece together because they are more likely to contain sequences that overlap.
However, T2T-CHM13 is still not the final word on the human genome. The T2T team struggled to resolve certain regions on chromosomes and estimated that about 0.3 percent of the genome may contain errors. But there are no more gaps, Miga says. However, the sperm for which the molar cell line exists did contribute an X chromosome. With the new technology, researchers have not yet been able to sequence the Y chromosomes typical of male development.