Accurate, scalable cohort variant calls using DeepVariant and GLnexus.

TitleAccurate, scalable cohort variant calls using DeepVariant and GLnexus.
Publication TypeJournal Article
Year of Publication2021
AuthorsYun, T, Li, H, Chang, P-C, Lin, MF, Carroll, A, McLean, CY
Date Published2021 Jan 05

MOTIVATION: Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging.

RESULTS: We introduce an open-source cohort-calling method that uses the highly-accurate caller DeepVariant and scalable merging tool GLnexus. Using callset quality metrics based on variant recall and precision in benchmark samples and Mendelian consistency in father-mother-child trios, we optimized the method across a range of cohort sizes, sequencing methods, and sequencing depths. The resulting callsets show consistent quality improvements over those generated using existing best practices with reduced cost. We further evaluate our pipeline in the deeply sequenced 1000 Genomes Project (1KGP) samples and show superior callset quality metrics and imputation reference panel performance compared to an independently-generated GATK Best Practices pipeline.

AVAILABILITY AND IMPLEMENTATION: We publicly release the 1KGP individual-level variant calls and cohort callset ( to foster additional development and evaluation of cohort merging methods as well as broad studies of genetic variation. Both DeepVariant ( and GLnexus ( are open-sourced, and the optimized GLnexus setup discovered in this study is also integrated into GLnexus public releases v1.2.2 and later.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Alternate JournalBioinformatics
PubMed ID33399819
PubMed Central IDPMC8023681
Grant ListU01 HG007301 / HG / NHGRI NIH HHS / United States
U01 HG007417 / HG / NHGRI NIH HHS / United States
UM1 HG008901 / HG / NHGRI NIH HHS / United States