Environmental DNA (eDNA) metabarcoding is a promising method to monitor species and community diversity that is rapid, affordable and non‐invasive. The longstanding needs of the eDNA community are modular informatics tools, comprehensive and customizable reference databases, flexibility across high‐throughput sequencing platforms, fast multilocus metabarcode processing and accurate taxonomic assignment. Improvements in bioinformatics tools make addressing each of these demands within a single toolkit a reality. The new modular metabarcode sequence toolkit Anacapa ( https://github.com/limey-bean/Anacapa/) addresses the above needs, allowing users to build comprehensive reference databases and assign taxonomy to raw multilocus metabarcode sequence data. A novel aspect of Anacapa is its database building module, “Creating Reference libraries Using eXisting tools” (CRUX), which generates comprehensive reference databases for specific user‐defined metabarcoding loci. The Quality Control and ASV Parsing module sorts and processes multiple metabarcoding loci and processes merged, unmerged and unpaired reads maximizing recovered diversity. DADA2 then detects amplicon sequence variants (ASVs) and the Anacapa Classifier module aligns these ASVs to CRUX‐generated reference databases using Bowtie2. Lastly, taxonomy is assigned to ASVs with confidence scores using a Bayesian Lowest Common Ancestor (BLCA) method. The Anacapa Toolkit also includes an r package, ranacapa, for automated results exploration through standard biodiversity statistical analysis. Benchmarking tests verify that the Anacapa Toolkit effectively and efficiently generates comprehensive reference databases that capture taxonomic diversity, and can assign taxonomy to both MiSeq and HiSeq‐length sequence data. We demonstrate the value of the Anacapa Toolkit in assigning taxonomy to seawater eDNA samples collected in southern California. The Anacapa Toolkit improves the functionality of eDNA and streamlines biodiversity assessment and management by generating metabarcode specific databases, processing multilocus data, retaining a larger proportion of sequencing reads and expanding non‐traditional eDNA targets. All the components of the Anacapa Toolkit are open and available in a virtual container to ease installation.
are co-equal second authors.Robert Wayne and Rachel S. Meyer are co-equal senior authors. Abstract 1. Environmental DNA (eDNA) metabarcoding is a promising method to monitor species and community diversity that is rapid, affordable and non-invasive. The longstanding needs of the eDNA community are modular informatics tools, comprehensive and customizable reference databases, flexibility across high-throughput sequencing platforms, fast multilocus metabarcode processing and accurate taxonomic assignment. Improvements in bioinformatics tools make addressing each of these demands within a single toolkit a reality.2. The new modular metabarcode sequence toolkit Anacapa (https ://github.com/ limey-bean/Anaca pa/) addresses the above needs, allowing users to build comprehensive reference databases and assign taxonomy to raw multilocus metabarcode sequence data. A novel aspect of Anacapa is its database building module, "Creating Reference libraries Using eXisting tools" (CRUX), which generates comprehensive reference databases for specific user-defined metabarcoding loci. The Quality Control and ASV Parsing module sorts and processes multiple metabarcoding loci and processes merged, unmerged and unpaired reads maximizing recovered diversity. DADA2 then detects amplicon sequence variants (ASVs) and the Anacapa Classifier module aligns these ASVs to CRUX-generated reference databases using Bowtie2. Lastly, taxonomy is assigned to ASVs with confidence scores using a Bayesian Lowest Common Ancestor (BLCA) method. The Anacapa Toolkit also includes an r package, ranacapa, for automated results exploration through standard biodiversity statistical analysis.3. Benchmarking tests verify that the Anacapa Toolkit effectively and efficiently generates comprehensive reference databases that capture taxonomic diversity, and can assign taxonomy to both MiSeq and HiSeq-length sequence data. We demonstrate the value of the Anacapa Toolkit in assigning taxonomy to seawater eDNA samples collected in southern California.
The non-human primate reference transcriptome resource (NHPRTR, available online at http://nhprtr.org/) aims to generate comprehensive RNA-seq data from a wide variety of non-human primates (NHPs), from lemurs to hominids. In the 2012 Phase I of the NHPRTR project, 19 billion fragments or 3.8 terabases of transcriptome sequences were collected from pools of ∼20 tissues in 15 species and subspecies. Here we describe a major expansion of NHPRTR by adding 10.1 billion fragments of tissue-specific RNA-seq data. For this effort, we selected 11 of the original 15 NHP species and subspecies and constructed total RNA libraries for the same ∼15 tissues in each. The sequence quality is such that 88% of the reads align to human reference sequences, allowing us to compute the full list of expression abundance across all tissues for each species, using the reads mapped to human genes. This update also includes improved transcript annotations derived from RNA-seq data for rhesus and cynomolgus macaques, two of the most commonly used NHP models and additional RNA-seq data compiled from related projects. Together, these comprehensive reference transcriptomes from multiple primates serve as a valuable community resource for genome annotation, gene dynamics and comparative functional analysis.
RNA-based next-generation sequencing (RNA-Seq) provides a tremendous amount of new information regarding gene and transcript structure, expression and regulation. This is particularly true for non-coding RNAs where whole transcriptome analyses have revealed that the much of the genome is transcribed and that many non-coding transcripts have widespread functionality. However, uniform resources for raw, cleaned and processed RNA-Seq data are sparse for most organisms and this is especially true for non-human primates (NHPs). Here, we describe a large-scale RNA-Seq data and analysis infrastructure, the NHP reference transcriptome resource (http://nhprtr.org); it presently hosts data from12 species of primates, to be expanded to 15 species/subspecies spanning great apes, old world monkeys, new world monkeys and prosimians. Data are collected for each species using pools of RNA from comparable tissues. We provide data access in advance of its deposition at NCBI, as well as browsable tracks of alignments against the human genome using the UCSC genome browser. This resource will continue to host additional RNA-Seq data, alignments and assemblies as they are generated over the coming years and provide a key resource for the annotation of NHP genomes as well as informing primate studies on evolution, reproduction, infection, immunity and pharmacology.
Hormone signaling is often pulsatile, and multi-parameter deconvolution procedures have long been utilized to identify and characterize secretory events. However, the existing programs have serious limitations, including the subjective nature of initial peak selection, lack of statistical verification of presumed bursts, and user-unfriendliness of the application. Here, we describe a novel deconvolution program, AutoDecon, which addresses these concerns. We validate AutoDecon for application to serum luteinizing hormone (LH) concentration time series using synthetic data mimicking real data from normal women and then comparing the performance of AutoDecon to the performance of the widely-employed hormone pulsatility analysis program Cluster. The sensitivity of AutoDecon is higher than Cluster: ~96% vs. ~80% (p = 0.001). However, Cluster had a lower false-positive detection rate than AutoDecon: 6% vs 1%, p = 0.001. Further analysis demonstrated that the pulsatility parameters recovered by AutoDecon were indistinguishable from those characterizing the synthetic data and sampling at 5-or 10-minute intervals was optimal for maximizing the sensitivity rates for LH. Accordingly, AutoDecon presents a viable non-subjective alternative to previous pulse detection algorithms for the analysis of LH data. It is applicable to other pulsatile hormone-concentration time series and many other pulsatile phenomena. The software is free and downloadable at
In contrast to infections with human immunodeficiency virus (HIV) in humans and simian immunodeficiency virus (SIV) in macaques, SIV infection of a natural host, sooty mangabeys (Cercocebus atys), is non-pathogenic despite high viraemia1. Here we sequenced and assembled the genome of a captive sooty mangabey. We conducted genome-wide comparative analyses of transcript assemblies from C. atys and AIDS-susceptible species, such as humans and macaques, to identify candidates for host genetic factors that influence susceptibility. We identified several immune-related genes in the genome of C. atys that show substantial sequence divergence from macaques or humans. One of these sequence divergences, a C-terminal frameshift in the toll-like receptor-4 (TLR4) gene of C. atys, is associated with a blunted in vitro response to TLR-4 ligands. In addition, we found a major structural change in exons 3–4 of the immune-regulatory protein intercellular adhesion molecule 2 (ICAM-2); expression of this variant leads to reduced cell surface expression of ICAM-2. These data provide a resource for comparative genomic studies of HIV and/or SIV pathogenesis and may help to elucidate the mechanisms by which SIV-infected sooty mangabeys avoid AIDS.
1Human severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is most closely 2 related, by average genetic distance, to two coronaviruses isolated from bats, RaTG13 and 3 RmYN02. However, there is a segment of high amino acid similarity between human CoV-2 and a pangolin isolated strain, GD410721, in the receptor binding domain (RBD) of 5 the spike protein, a pattern that can be caused by either recombination or by convergent 6 amino acid evolution driven by natural selection. We perform a detailed analysis of the 7 synonymous divergence, which is less likely to be affected by selection than amino acid 8 divergence, between human SARS-CoV-2 and related strains. We show that the 9 synonymous divergence between the bat derived viruses and SARS-CoV-2 is larger than 10 between GD410721 and SARS-CoV-2 in the RBD, providing strong additional support for 11 the recombination hypothesis. However, the synonymous divergence between pangolin 12 strain and SARS-CoV-2 is also relatively high, which is not consistent with a recent 13 recombination between them, instead it suggests a recombination into RaTG13. We also 14 find a 14-fold increase in the d N /d S ratio from the lineage leading to SARS-CoV-2 to the 15 strains of the current pandemic, suggesting that the vast majority of non-synonymous 16 mutations currently segregating within the human strains have a negative impact on viral 17fitness. Finally, we estimate that the time to the most recent common ancestor of SARS-18CoV-2 and RaTG13 or RmYN02 based on synonymous divergence, is 51.71 years (95% 19 C.I., 28.11-75.31) and 37.02 years (95% C.I., 18.19-55.85), respectively. 20 21 a coronavirus (Lu, et al. 2020; Zhang and Holmes 2020), Severe acute respiratory syndrome 1 coronavirus 2 (SARS-CoV-2), an RNA virus with a 29,891 bp genome consisting of four major 2 structural genes (Wu, et al. 2020; Zhou, Yang, et al. 2020). Of particular relevance to this study 3 is the spike protein which is responsible for binding to the primary receptor for the virus, 4 angiotensin-converting enzyme 2 (ACE2) (Wan, et al. 2020; Wu, et al. 2020; Zhou, Yang, et al. 5 2020). 6Human SARS-CoV-2 is related to a coronavirus (RaTG13) isolated from the bat 7Rhinolophus affinis from Yunnan province of China (Zhou, Yang, et al. 2020). RaTG13 and the 8 human strain reference sequence (Genbank accession number MN996532) are 96.2% identical 9and it was first argued that, throughout the genome, RaTG13 is the closest relative to human 10 SARS-CoV-2 (Zhou, et al. 2020). Zhang, et al. 2020 showed that RaTG13 and SARS-CoV-2 11 were 91.02% and 90.55% identical ,respectively, to coronaviruses isolated from pangolins 12 (Pangolin-CoV), which therefore form a close outgroup to the SARS-CoV-2+RaTG13 clade . 13 Furthermore, five key amino acids in the receptor-binding domain (RBD) of spike were identical 14 between SARS-CoV-2 and Pangolin-CoV, but differed between those two strains and RaTG13. 15 (Lam, et al. 2020) independently made similar observations and additionally showed that when 16 analyzing...
show significant allele frequency differences between tame and aggressive population (1% 12 FDR), including genes with a role in neural crest cell fate determination.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.