Adaptation from standing genetic variation or recurrent de novo mutation in large populations should commonly generate soft rather than hard selective sweeps. In contrast to a hard selective sweep, in which a single adaptive haplotype rises to high population frequency, in a soft selective sweep multiple adaptive haplotypes sweep through the population simultaneously, producing distinct patterns of genetic variation in the vicinity of the adaptive site. Current statistical methods were expressly designed to detect hard sweeps and most lack power to detect soft sweeps. This is particularly unfortunate for the study of adaptation in species such as Drosophila melanogaster, where all three confirmed cases of recent adaptation resulted in soft selective sweeps and where there is evidence that the effective population size relevant for recent and strong adaptation is large enough to generate soft sweeps even when adaptation requires mutation at a specific single site at a locus. Here, we develop a statistical test based on a measure of haplotype homozygosity (H12) that is capable of detecting both hard and soft sweeps with similar power. We use H12 to identify multiple genomic regions that have undergone recent and strong adaptation in a large population sample of fully sequenced Drosophila melanogaster strains from the Drosophila Genetic Reference Panel (DGRP). Visual inspection of the top 50 candidates reveals that in all cases multiple haplotypes are present at high frequencies, consistent with signatures of soft sweeps. We further develop a second haplotype homozygosity statistic (H2/H1) that, in combination with H12, is capable of differentiating hard from soft sweeps. Surprisingly, we find that the H12 and H2/H1 values for all top 50 peaks are much more easily generated by soft rather than hard sweeps. We discuss the implications of these results for the study of adaptation in Drosophila and in species with large census population sizes.
We present the Metagenomic Intra-species Diversity Analysis System (MIDAS), which is an integrated computational pipeline for quantifying bacterial species abundance and strain-level genomic variation, including gene content and single-nucleotide polymorphisms (SNPs), from shotgun metagenomes. Our method leverages a database of more than 30,000 bacterial reference genomes that we clustered into species groups. These cover the majority of abundant species in the human microbiome but only a small proportion of microbes in other environments, including soil and seawater. We applied MIDAS to stool metagenomes from 98 Swedish mothers and their infants over one year and used rare SNPs to track strains between hosts. Using this approach, we found that although species compositions of mothers and infants converged over time, strain-level similarity diverged. Specifically, early colonizing bacteria were often transmitted from an infant's mother, while late colonizing bacteria were often transmitted from other sources in the environment and were enriched for sporeformation genes. We also applied MIDAS to 198 globally distributed marine metagenomes and used gene content to show that many prevalent bacterial species have population structure that correlates with geographic location. Strain-level genetic variants present in metagenomes clearly reveal extensive structure and dynamics that are obscured when data are analyzed at a coarser taxonomic resolution.[Supplemental material is available for this article.]Microbial species play important roles in the different environments that they inhabit. However, different strains of the same species can differ significantly in their gene content (Greenblum et al. 2015;Zhu et al. 2015) and single-nucleotide polymorphisms (SNPs) (Schloissnig et al. 2013;Kashtan et al. 2014;Lieberman et al. 2014). These strain-level differences are important for understanding microbial evolution, adaptation, pathogenicity, and transmission. For example, strain-level differences have shed light on ecological differentiation of closely related bacteria (Shapiro et al. 2012), uncovered the presence of ancient subpopulations of marine bacteria (Kashtan et al. 2014), and highlighted extensive intra-species recombination (Snitkin et al. 2011;Rosen et al. 2015). Strain-level variation is also important for understanding microbial pathogenicity. Differences at the nucleotide level can lead to within-host adaptation of pathogens (Lieberman et al. 2014), and differences in gene content can confer drug resistance, convert a commensal bacterium into a pathogen (Snitkin et al. 2011), or lead to outbreaks of highly virulent strains (Rasko et al. 2011).Metagenomic shotgun sequencing has the potential to shed light onto strain-level heterogeneity among bacterial genomes within and between microbial communities, yielding a genomic resolution not achievable by sequencing the 16S ribosomal RNA gene alone ) and circumventing the need for culture-based approaches. However, limitations of existing computational methods and ...
Gut microbiota are shaped by a combination of ecological and evolutionary forces. While the ecological dynamics have been extensively studied, much less is known about how species of gut bacteria evolve over time. Here, we introduce a model-based framework for quantifying evolutionary dynamics within and across hosts using a panel of metagenomic samples. We use this approach to study evolution in approximately 40 prevalent species in the human gut. Although the patterns of between-host diversity are consistent with quasi-sexual evolution and purifying selection on long timescales, we identify new genealogical signatures that challenge standard population genetic models of these processes. Within hosts, we find that genetic differences that accumulate over 6-month timescales are only rarely attributable to replacement by distantly related strains. Instead, the resident strains more commonly acquire a smaller number of putative evolutionary changes, in which nucleotide variants or gene gains or losses rapidly sweep to high frequency. By comparing these mutations with the typical between-host differences, we find evidence that some sweeps may be seeded by recombination, in addition to new mutations. However, comparisons of adult twins suggest that replacement eventually overwhelms evolution over multi-decade timescales, hinting at fundamental limits to the extent of local adaptation. Together, our results suggest that gut bacteria can evolve on human-relevant timescales, and they highlight the connections between these short-term evolutionary dynamics and longer-term evolution across hosts.
Asian rice, Oryza sativa, is one of world's oldest and most important crop species. Rice is believed to have been domesticated ∼9,000 y ago, although debate on its origin remains contentious. A single-origin model suggests that two main subspecies of Asian rice, indica and japonica, were domesticated from the wild rice O. rufipogon. In contrast, the multiple independent domestication model proposes that these two major rice types were domesticated separately and in different parts of the species range of wild rice. This latter view has gained much support from the observation of strong genetic differentiation between indica and japonica as well as several phylogenetic studies of rice domestication. We reexamine the evolutionary history of domesticated rice by resequencing 630 gene fragments on chromosomes 8, 10, and 12 from a diverse set of wild and domesticated rice accessions. Using patterns of SNPs, we identify 20 putative selective sweeps on these chromosomes in cultivated rice. Demographic modeling based on these SNP data and a diffusion-based approach provide the strongest support for a single domestication origin of rice. Bayesian phylogenetic analyses implementing the multispecies coalescent and using previously published phylogenetic sequence datasets also point to a single origin of Asian domesticated rice. Finally, we date the origin of domestication at ∼8,200-13,500 y ago, depending on the molecular clock estimate that is used, which is consistent with known archaeological data that suggests rice was first cultivated at around this time in the Yangtze Valley of China.
Positive natural selection can lead to a decrease in genomic diversity at the selected site and at linked sites, producing a characteristic signature of elevated expected haplotype homozygosity. These selective sweeps can be hard or soft. In the case of a hard selective sweep, a single adaptive haplotype rises to high population frequency, whereas multiple adaptive haplotypes sweep through the population simultaneously in a soft sweep, producing distinct patterns of genetic variation in the vicinity of the selected site. Measures of expected haplotype homozygosity have previously been used to detect sweeps in multiple study systems. However, these methods are formulated for phased haplotype data, typically unavailable for nonmodel organisms, and some may have reduced power to detect soft sweeps due to their increased genetic diversity relative to hard sweeps. To address these limitations, we applied the H12 and H2/H1 statistics proposed in 2015 by Garud et al., which have power to detect both hard and soft sweeps, to unphased multilocus genotypes, denoting them as G12 and G2/G1. G12 (and the more direct expected homozygosity analog to H12, denoted G123) has comparable power to H12 for detecting both hard and soft sweeps. G2/G1 can be used to classify hard and soft sweeps analogously to H2/H1, conditional on a genomic region having high G12 or G123 values. The reason for this power is that, under random mating, the most frequent haplotypes will yield the most frequent multilocus genotypes. Simulations based on parameters compatible with our recent understanding of human demographic history suggest that expected homozygosity methods are best suited for detecting recent sweeps, and increase in power under recent population expansions. Finally, we find candidates for selective sweeps within the 1000 Genomes CEU, YRI, GIH, and CHB populations, which corroborate and complement existing studies.
Patterns of nucleotide polymorphism within populations of Drosophila melanogaster suggest that insecticides have been the selective agents driving the strongest recent bouts of positive selection. However, there is a need to explicitly link selective sweeps to the particular insecticide phenotypes that could plausibly account for the drastic selective responses that are observed in these non-target insects. Here, we screen the Drosophila Genetic Reference Panel with two common insecticides; malathion (an organophosphate) and permethrin (a pyrethroid). Genome-wide association studies map survival on malathion to two of the largest sweeps in the D. melanogaster genome; Ace and Cyp6g1. Malathion survivorship also correlates with lines which have high levels of Cyp12d1, Jheh1 and Jheh2 transcript abundance. Permethrin phenotypes map to the largest cluster of P450 genes in the Drosophila genome, however in contrast to a selective sweep driven by insecticide use, the derived allele seems to be associated with susceptibility. These results underscore previous findings that highlight the importance of structural variation to insecticide phenotypes: Cyp6g1 exhibits copy number variation and transposable element insertions, Cyp12d1 is tandemly duplicated, the Jheh loci are associated with a Bari1 transposable element insertion, and a Cyp6a17 deletion is associated with susceptibility.
The extent to which selection and demography impact patterns of genetic diversity in natural populations of Drosophila melanogaster is yet to be fully understood. We previously observed that linkage disequilibrium (LD) at scales of 10 kb in the Drosophila Genetic Reference Panel (DGRP), consisting of 145 inbred strains from Raleigh, North Carolina, measured both between pairs of sites and as haplotype homozygosity, is elevated above neutral demographic expectations. We also demonstrated that signatures of strong and recent soft sweeps are abundant. However, the extent to which these patterns are specific to this derived and admixed population is unknown. It is also unclear whether these patterns are a consequence of the extensive inbreeding performed to generate the DGRP data. Here we analyze LD statistics in a sample of .100 fully-sequenced strains from Zambia; an ancestral population to the Raleigh population that has experienced little to no admixture and was generated by sequencing haploid embryos rather than inbred strains. We find an elevation in long-range LD and haplotype homozygosity compared to neutral expectations in the Zambian sample, thus showing the elevation in LD is not specific to the DGRP data set. This elevation in LD and haplotype structure remains even after controlling for possible confounders including genomic inversions, admixture, population substructure, close relatedness of individual strains, and recombination rate variation. Furthermore, signatures of partial soft sweeps similar to those found in the DGRP as well as partial hard sweeps are common in Zambia. These results suggest that while the selective forces and sources of adaptive mutations may differ in Zambia and Raleigh, elevated long-range LD and signatures of soft sweeps are generic in D. melanogaster.KEYWORDS Drosophila melanogaster; demography; linkage disequilibrium; haplotype homozygosity; selection D ISENTANGLING the effects of demography and selection on patterns of genomic variation remains a central challenge in evolutionary biology. Until recently, inference of demography and selection primarily relied on short genomic fragments sampled from a limited number of individuals (e.g., Pritchard et al. 2000;Molina et al. 2001;Li and Stephan 2006;Thornton and Andolfatto 2006;Gutenkunst et al. 2009;Duchen et al. 2013). While these studies provided important insights into the evolutionary forces acting on populations, they were ultimately limited by both sample size and the physical scale in which patterns of polymorphism and linkage could be investigated. The recent availability of wholegenome sequences from multiple individuals with populations in a variety of species (e.g., Abecasis et al. 2010;Cao et al. 2011;Mackay et al. 2012) is enabling us to examine long-range patterns of linkage disequilibrium (LD) (10 kb), measured either as correlations between pairs of sites or as haplotype homozygosity. LD offers powerful insights into selective and demographic processes shaping genetic variation in a natural populatio...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.