%0 Journal Article %J Science %D 2021 %T Haplotype-resolved diverse human genomes and integrated analysis of structural variation. %A Ebert, Peter %A Audano, Peter A %A Zhu, Qihui %A Rodriguez-Martin, Bernardo %A Porubsky, David %A Bonder, Marc Jan %A Sulovari, Arvis %A Ebler, Jana %A Zhou, Weichen %A Serra Mari, Rebecca %A Yilmaz, Feyza %A Zhao, Xuefang %A Hsieh, PingHsun %A Lee, Joyce %A Kumar, Sushant %A Lin, Jiadong %A Rausch, Tobias %A Chen, Yu %A Ren, Jingwen %A Santamarina, Martin %A Höps, Wolfram %A Ashraf, Hufsah %A Chuang, Nelson T %A Yang, Xiaofei %A Munson, Katherine M %A Lewis, Alexandra P %A Fairley, Susan %A Tallon, Luke J %A Clarke, Wayne E %A Basile, Anna O %A Byrska-Bishop, Marta %A Corvelo, André %A Evani, Uday S %A Lu, Tsung-Yu %A Chaisson, Mark J P %A Chen, Junjie %A Li, Chong %A Brand, Harrison %A Wenger, Aaron M %A Ghareghani, Maryam %A Harvey, William T %A Raeder, Benjamin %A Hasenfeld, Patrick %A Regier, Allison A %A Abel, Haley J %A Hall, Ira M %A Flicek, Paul %A Stegle, Oliver %A Gerstein, Mark B %A Tubio, Jose M C %A Mu, Zepeng %A Li, Yang I %A Shi, Xinghua %A Hastie, Alex R %A Ye, Kai %A Chong, Zechen %A Sanders, Ashley D %A Zody, Michael C %A Talkowski, Michael E %A Mills, Ryan E %A Devine, Scott E %A Lee, Charles %A Korbel, Jan O %A Marschall, Tobias %A Eichler, Evan E %K Female %K Genetic Variation %K Genome, Human %K Genotype %K Haplotypes %K High-Throughput Nucleotide Sequencing %K Humans %K INDEL Mutation %K Interspersed Repetitive Sequences %K Male %K Population Groups %K Quantitative Trait Loci %K Retroelements %K Sequence Analysis, DNA %K Sequence Inversion %K Whole Genome Sequencing %X

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.

%B Science %V 372 %8 2021 04 02 %G eng %N 6537 %1 https://www.ncbi.nlm.nih.gov/pubmed/33632895?dopt=Abstract %R 10.1126/science.abf7117 %0 Journal Article %J Nature %D 2020 %T A structural variation reference for medical and population genetics. %A Collins, Ryan L %A Brand, Harrison %A Karczewski, Konrad J %A Zhao, Xuefang %A Alföldi, Jessica %A Francioli, Laurent C %A Khera, Amit V %A Lowther, Chelsea %A Gauthier, Laura D %A Wang, Harold %A Watts, Nicholas A %A Solomonson, Matthew %A O'Donnell-Luria, Anne %A Baumann, Alexander %A Munshi, Ruchi %A Walker, Mark %A Whelan, Christopher W %A Huang, Yongqing %A Brookings, Ted %A Sharpe, Ted %A Stone, Matthew R %A Valkanas, Elise %A Fu, Jack %A Tiao, Grace %A Laricchia, Kristen M %A Ruano-Rubio, Valentin %A Stevens, Christine %A Gupta, Namrata %A Cusick, Caroline %A Margolin, Lauren %A Taylor, Kent D %A Lin, Henry J %A Rich, Stephen S %A Post, Wendy S %A Chen, Yii-Der Ida %A Rotter, Jerome I %A Nusbaum, Chad %A Philippakis, Anthony %A Lander, Eric %A Gabriel, Stacey %A Neale, Benjamin M %A Kathiresan, Sekar %A Daly, Mark J %A Banks, Eric %A MacArthur, Daniel G %A Talkowski, Michael E %K Continental Population Groups %K Disease %K Female %K Genetic Testing %K Genetic Variation %K Genetics, Medical %K Genetics, Population %K Genome, Human %K Genotyping Techniques %K Humans %K Male %K Middle Aged %K Mutation %K Polymorphism, Single Nucleotide %K Reference Standards %K Selection, Genetic %K Whole Genome Sequencing %X

Structural variants (SVs) rearrange large segments of DNA and can have profound consequences in evolution and human disease. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD) have become integral in the interpretation of single-nucleotide variants (SNVs). However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings. This SV resource is freely distributed via the gnomAD browser and will have broad utility in population genetics, disease-association studies, and diagnostic screening.

%B Nature %V 581 %P 444-451 %8 2020 05 %G eng %N 7809 %1 https://www.ncbi.nlm.nih.gov/pubmed/32461652?dopt=Abstract %R 10.1038/s41586-020-2287-8