Solanum pennellii is a wild tomato species endemic to Andean regions in South America, where it has evolved to thrive in arid habitats. Because of its extreme stress tolerance and unusual morphology, it is an important donor of germplasm for the cultivated tomato Solanum lycopersicum 1 . Introgression lines (ILs) in which large genomic regions of S. lycopersicum are replaced with the corresponding segments from S. pennellii can show remarkably superior agronomic performance 2 . Here we describe a high-quality genome assembly of the parents of the IL population. By anchoring the S. pennellii genome to the genetic map, we define candidate genes for stress tolerance and provide evidence that transposable elements had a role in the evolution of these traits. Our work paves a path toward further tomato improvement and for deciphering the mechanisms underlying the myriad other agronomic traits that can be improved with S. pennellii germplasm.Crosses between distantly related plants can lead to substantial improvements in performance. Notably, S. pennellii × S. lycopersicum ILs have been used to define numerous quantitative trait loci (QTLs) for superior yield, chemical composition, morphology, abiotic stress tolerance and extreme heterosis 3,4 . Although genetic studies have proven informative, few genes underlying specific QTLs have been cloned, largely because of the lack of a S. pennellii genome sequence. To support QTL analyses, we sequenced the genome of S. pennellii using Illumina sequencing with ~190-fold coverage ( Fig. 1 and Supplementary Tables 1-5). The initial assembly size was 942 Mb, with a scaffold N50 value of 1.7 Mb and N90 value of 0.43 Mb (Table 1 and Supplementary Tables 6 and 7). We estimated the total genome size to be about 1.2 Gb using a k-mer-based analysis ( Supplementary Fig. 1 and Supplementary Table 8), in accordance with previous estimations 3,4 . We anchored 97.1% of the genome assembly to chromosomes using genetic maps and restriction site-associated DNA sequencing (RAD-seq)-based markers from the IL population 5 (Supplementary Note). Comparison of the assembly to publicly available BAC sequences indicated an accuracy of >99.9%, and a satisfactory accuracy of gap-filled regions was shown by realigning reads (Supplementary Fig. 2 and Supplementary Table 9). Of the 307,350 S. lycopersicum and 7,812 S. pennellii publicly available ESTs, 93% and >96% could be aligned to the genome, respectively (Supplementary Table 10), indicating comprehensive coverage of the gene-rich regions. We predicted 32,273 high-confidence genes and a potential set of 44,966 protein-coding genes and checked these
SUMMARYWe explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative of the Lycopersicon, Arcanum, Eriopersicon and Neolycopersicon groups, which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new reference genomes were reconstructed to support our comparative genome analyses. Comparative sequence alignment revealed group-, species-and accession-specific polymorphisms, explaining characteristic fruit traits and growth habits in the various cultivars. Using gene models from the annotated Heinz 1706 reference genome, we observed differences in the ratio between non-synonymous and synonymous SNPs (dN/dS) in fruit diversification and plant growth genes compared to a random set of genes, indicating positive selection and differences in selection pressure between crop accessions and wild species. In wild species, the number of single-nucleotide polymorphisms (SNPs) exceeds 10 million, i.e. 20-fold higher than found in most of the crop accessions, indicating dramatic genetic erosion of crop and heirloom tomatoes. In addition, the highest levels of heterozygosity were found for allogamous self-incompatible wild species, while facultative and autogamous self-compatible species display a lower heterozygosity level. Using whole-genome SNP information for maximum-likelihood analysis, we achieved complete tree resolution, whereas maximum-likelihood trees based on SNPs from ten fruit and growth genes show incomplete resolution for the crop accessions, partly due to the effect of heterozygous SNPs. Finally, results suggest that phylogenetic relationships are correlated with habitat, indicating the occurrence of geographical races within these groups, which is of practical importance for Solanum genome evolution studies.
MotivationBioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters).Availability and ImplementationThe software is freely available at github.com/BioContainers/.
SummaryChromosomal inversions can provide windows onto the cytogenetic, molecular, evolutionary and demographic histories of a species. Here we investigate a paracentric 1.17‐Mb inversion on chromosome 4 of Arabidopsis thaliana with nucleotide precision of its borders. The inversion is created by Vandal transposon activity, splitting an F‐box and relocating a pericentric heterochromatin segment in juxtaposition with euchromatin without affecting the epigenetic landscape. Examination of the RegMap panel and the 1001 Arabidopsis genomes revealed more than 170 inversion accessions in Europe and North America. The SNP patterns revealed historical recombinations from which we infer diverse haplotype patterns, ancient introgression events and phylogenetic relationships. We find a robust association between the inversion and fecundity under drought. We also find linkage disequilibrium between the inverted region and the early flowering Col‐FRIGIDA allele. Finally, SNP analysis elucidates the origin of the inversion to South‐Eastern Europe approximately 5000 years ago and the FRI‐Col allele to North‐West Europe, and reveals the spreading of a single haplotype to North America during the 17th to 19th century. The ‘American haplotype’ was identified from several European localities, potentially due to return migration.
We determined the crossover (CO) distribution, frequency and genomic sequences involved in interspecies meiotic recombination by using parent-assigned variants of 52 F recombinant inbred lines obtained from a cross between tomato, Solanum lycopersicum, and its wild relative, Solanum pimpinellifolium. The interspecific CO frequency was 80% lower than reported for intraspecific tomato crosses. We detected regions showing a relatively high and low CO frequency, so-called hot and cold regions. Cold regions coincide to a large extent with the heterochromatin, although we found a limited number of smaller cold regions in the euchromatin. The CO frequency was higher at the distal ends of chromosomes than in pericentromeric regions and higher in short arm euchromatin. Hot regions of CO were detected in euchromatin, and COs were more often located in non-coding regions near the 5' untranslated region of genes than expected by chance. Besides overrepresented CCN repeats, we detected poly-A/T and AT-rich motifs enriched in 1-kb promoter regions flanking the CO sites. The most abundant sequence motifs at CO sites share weak similarity to transcription factor-binding sites, such as for the C2H2 zinc finger factors class and MADS box factors, while InterPro scans detected enrichment for genes possibly involved in the repair of DNA breaks.
SUMMARYBreeding by introgressive hybridization is a pivotal strategy to broaden the genetic basis of crops. Usually, the desired traits are monitored in consecutive crossing generations by marker-assisted selection, but their analyses fail in chromosome regions where crossover recombinants are rare or not viable. Here, we present the Introgression Browser (IBROWSER), a bioinformatics tool aimed at visualizing introgressions at nucleotide or SNP (Single Nucleotide Polymorphisms) accuracy. The software selects homozygous SNPs from Variant Call Format (VCF) information and filters out heterozygous SNPs, multi-nucleotide polymorphisms (MNPs) and insertion-deletions (InDels). For data analysis IBROWSER makes use of sliding windows, but if needed it can generate any desired fragmentation pattern through General Feature Format (GFF) information. In an example of tomato (Solanum lycopersicum) accessions we visualize SNP patterns and elucidate both position and boundaries of the introgressions. We also show that our tool is capable of identifying alien DNA in a panel of the closely related S. pimpinellifolium by examining phylogenetic relationships of the introgressed segments in tomato. In a third example, we demonstrate the power of the IBROWSER in a panel of 597 Arabidopsis accessions, detecting the boundaries of a SNP-free region around a polymorphic 1.17 Mbp inverted segment on the short arm of chromosome 4. The architecture and functionality of IBROWSER makes the software appropriate for a broad set of analyses including SNP mining, genome structure analysis, and pedigree analysis. Its functionality, together with the capability to process large data sets and efficient visualization of sequence variation, makes IBROWSER a valuable breeding tool.
BackgroundIdentification of biological specimens is a requirement for a range of applications. Reference-free methods analyse unprocessed sequencing data without relying on prior knowledge, but generally do not scale to arbitrarily large genomes and arbitrarily large phylogenetic distances.ResultsWe present Cnidaria, a practical tool for clustering genomic and transcriptomic data with no limitation on genome size or phylogenetic distances. We successfully simultaneously clustered 169 genomic and transcriptomic datasets from 4 kingdoms, achieving 100 % identification accuracy at supra-species level and 78 % accuracy at the species level.ConclusionCNIDARIA allows for fast, resource-efficient comparison and identification of both raw and assembled genome and transcriptome data. This can help answer both fundamental (e.g. in phylogeny, ecological diversity analysis) and practical questions (e.g. sequencing quality control, primer design).Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0806-7) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.