Submitted by ja607 on
Title | Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2. |
Publication Type | Journal Article |
Year of Publication | 2020 |
Authors | Shen, F, Kidd, JM |
Journal | Genes (Basel) |
Volume | 11 |
Issue | 2 |
Date Published | 2020 01 29 |
ISSN | 2073-4425 |
Keywords | Algorithms, Computational Biology, DNA Copy Number Variations, Evolution, Molecular, Gene Duplication, Genome, Human, Humans, Sequence Analysis, DNA |
Abstract | Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to distinguish among duplicated sequences. Specialized approaches to interrogate specific paralogs are comparatively slow and have a high degree of computational complexity, limiting their effective application to emerging population-scale data sets. We present QuicK-mer2, a self-contained, mapping-free approach that enables the rapid construction of paralog-specific copy-number maps from short-read sequence data. This approach is based on the tabulation of unique k-mer sequences from short-read data sets, and is able to analyze a 20X coverage human genome in approximately 20 min. We applied our approach to newly released sequence data from the 1000 Genomes Project, constructed paralog-specific copy-number maps from 2457 unrelated individuals, and uncovered copy-number variation of paralogous genes. We identify nine genes where none of the analyzed samples have a copy number of two, 92 genes where the majority of samples have a copy number other than two, and describe rare copy number variation effecting multiple genes at the APOBEC3 locus. |
DOI | 10.3390/genes11020141 |
Alternate Journal | Genes (Basel) |
PubMed ID | 32013076 |
PubMed Central ID | PMC7073954 |
Grant List | R01 GM103961 / GM / NIGMS NIH HHS / United States DP5 OD009154 / OD / NIH HHS / United States UM1 HG008901 / HG / NHGRI NIH HHS / United States |