%0 Journal Article
%J Am J Hum Genet
%D 2021
%T Association of structural variation with cardiometabolic traits in Finns.
%A Chen, Lei
%A Abel, Haley J
%A Das, Indraniel
%A Larson, David E
%A Ganel, Liron
%A Kanchi, Krishna L
%A Regier, Allison A
%A Young, Erica P
%A Kang, Chul Joo
%A Scott, Alexandra J
%A Chiang, Colby
%A Wang, Xinxin
%A Lu, Shuangjia
%A Christ, Ryan
%A Service, Susan K
%A Chiang, Charleston W K
%A Havulinna, Aki S
%A Kuusisto, Johanna
%A Boehnke, Michael
%A Laakso, Markku
%A Palotie, Aarno
%A Ripatti, Samuli
%A Freimer, Nelson B
%A Locke, Adam E
%A Stitziel, Nathan O
%A Hall, Ira M
%K Alleles
%K Cardiovascular Diseases
%K Cholesterol
%K DNA Copy Number Variations
%K Female
%K Finland
%K Genome, Human
%K Genomic Structural Variation
%K Genotype
%K High-Throughput Nucleotide Sequencing
%K Humans
%K Male
%K Mitochondrial Proteins
%K Promoter Regions, Genetic
%K Pyruvate Dehydrogenase (Lipoamide)-Phosphatase
%K Pyruvic Acid
%K Serum Albumin, Human
%X <p>The contribution of genome structural variation (SV) to quantitative traits associated with cardiometabolic diseases remains largely unknown. Here, we present the results of a study examining genetic association between SVs and cardiometabolic traits in the Finnish population. We used sensitive methods to identify and genotype 129,166 high-confidence SVs from deep whole-genome sequencing (WGS) data of 4,848 individuals. We tested the 64,572 common and low-frequency SVs for association with 116 quantitative traits and tested candidate associations using exome sequencing and array genotype data from an additional 15,205 individuals. We discovered 31 genome-wide significant associations at 15 loci, including 2 loci at which SVs have strong phenotypic effects: (1) a deletion of the ALB promoter that is greatly enriched in the Finnish population and causes decreased serum albumin level in carriers (p = 1.47 × 10) and is also associated with increased levels of total cholesterol (p = 1.22 × 10) and 14 additional cholesterol-related traits, and (2) a multi-allelic copy number variant (CNV) at PDPR that is strongly associated with pyruvate (p = 4.81 × 10) and alanine (p = 6.14 × 10) levels and resides within a structurally complex genomic region that has accumulated many rearrangements over evolutionary time. We also confirmed six previously reported associations, including five led by stronger signals in single nucleotide variants (SNVs) and one linking recurrent HP gene deletion and cholesterol levels (p = 6.24 × 10), which was also found to be strongly associated with increased glycoprotein level (p = 3.53 × 10). Our study confirms that integrating SVs in trait-mapping studies will expand our knowledge of genetic factors underlying disease risk.</p>
%B Am J Hum Genet
%V 108
%P 583-596
%8 2021 04 01
%G eng
%N 4
%1 https://www.ncbi.nlm.nih.gov/pubmed/33798444?dopt=Abstract
%R 10.1016/j.ajhg.2021.03.008

%0 Journal Article
%J Nature
%D 2020
%T Mapping and characterization of structural variation in 17,795 human genomes.
%A Abel, Haley J
%A Larson, David E
%A Regier, Allison A
%A Chiang, Colby
%A Das, Indraniel
%A Kanchi, Krishna L
%A Layer, Ryan M
%A Neale, Benjamin M
%A Salerno, William J
%A Reeves, Catherine
%A Buyske, Steven
%A Matise, Tara C
%A Muzny, Donna M
%A Zody, Michael C
%A Lander, Eric S
%A Dutcher, Susan K
%A Stitziel, Nathan O
%A Hall, Ira M
%K Alleles
%K Case-Control Studies
%K Continental Population Groups
%K Epigenesis, Genetic
%K Female
%K Gene Dosage
%K Genetic Variation
%K Genetics, Population
%K Genome, Human
%K High-Throughput Nucleotide Sequencing
%K Humans
%K Male
%K Molecular Sequence Annotation
%K Quantitative Trait Loci
%K Software
%K Whole Genome Sequencing
%X <p>A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.</p>
%B Nature
%V 583
%P 83-89
%8 2020 07
%G eng
%N 7814
%1 https://www.ncbi.nlm.nih.gov/pubmed/32460305?dopt=Abstract
%R 10.1038/s41586-020-2371-0

%0 Journal Article
%J Bioinformatics
%D 2019
%T svtools: population-scale analysis of structural variation.
%A Larson, David E
%A Abel, Haley J
%A Chiang, Colby
%A Badve, Abhijit
%A Das, Indraniel
%A Eldred, James M
%A Layer, Ryan M
%A Hall, Ira M
%X <p><b>SUMMARY: </b>Large-scale human genetics studies are now employing whole genome sequencing with the goal of conducting comprehensive trait mapping analyses of all forms of genome variation. However, methods for structural variation (SV) analysis have lagged far behind those for smaller scale variants, and there is an urgent need to develop more efficient tools that scale to the size of human populations. Here, we present a fast and highly scalable software toolkit (svtools) and cloud-based pipeline for assembling high quality SV maps-including deletions, duplications, mobile element insertions, inversions and other rearrangements-in many thousands of human genomes. We show that this pipeline achieves similar variant detection performance to established per-sample methods (e.g. LUMPY), while providing fast and affordable joint analysis at the scale of ≥100 000 genomes. These tools will help enable the next generation of human genetics studies.</p><p><b>AVAILABILITY AND IMPLEMENTATION: </b>svtools is implemented in Python and freely available (MIT) from https://github.com/hall-lab/svtools.</p><p><b>SUPPLEMENTARY INFORMATION: </b>Supplementary data are available at Bioinformatics online.</p>
%B Bioinformatics
%V 35
%P 4782-4787
%8 2019 Nov 01
%G eng
%N 22
%1 https://www.ncbi.nlm.nih.gov/pubmed/31218349?dopt=Abstract
%R 10.1093/bioinformatics/btz492

%0 Journal Article
%J Nat Commun
%D 2018
%T Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects.
%A Regier, Allison A
%A Farjoun, Yossi
%A Larson, David E
%A Krasheninina, Olga
%A Kang, Hyun Min
%A Howrigan, Daniel P
%A Chen, Bo-Juen
%A Kher, Manisha
%A Banks, Eric
%A Ames, Darren C
%A English, Adam C
%A Li, Heng
%A Xing, Jinchuan
%A Zhang, Yeting
%A Matise, Tara
%A Abecasis, Goncalo R
%A Salerno, Will
%A Zody, Michael C
%A Neale, Benjamin M
%A Hall, Ira M
%K Genome, Human
%K Human Genetics
%K Humans
%K Whole Genome Sequencing
%X <p>Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of current studies and data volumes. Here, we define WGS data processing standards that allow different groups to produce functionally equivalent (FE) results, yet still innovate on data processing pipelines. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results and produce significantly less variability than sequencing replicates. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for community-wide human genetics studies.</p>
%B Nat Commun
%V 9
%P 4038
%8 2018 10 02
%G eng
%N 1
%1 http://www.ncbi.nlm.nih.gov/pubmed/30279509?dopt=Abstract
%R 10.1038/s41467-018-06159-4