Reactive species such as free radicals are constantly generated in vivo and DNA is the most important target of oxidative stress. Oxidative DNA damage is used as a predictive biomarker to monitor the risk of development of many diseases. The comet assay is widely used for measuring oxidative DNA damage at a single cell level. The analysis of comet assay output images, however, poses considerable challenges. Commercial software is costly and restrictive, while free software generally requires laborious manual tagging of cells. This paper presents OpenComet, an open-source software tool providing automated analysis of comet assay images. It uses a novel and robust method for finding comets based on geometric shape attributes and segmenting the comet heads through image intensity profile analysis. Due to automation, OpenComet is more accurate, less prone to human bias, and faster than manual analysis. A live analysis functionality also allows users to analyze images captured directly from a microscope. We have validated OpenComet on both alkaline and neutral comet assay images as well as sample images from existing software packages. Our results show that OpenComet achieves high accuracy with significantly reduced analysis time.
Word models (natural language descriptions of molecular mechanisms) are a common currency in spoken and written communication in biomedicine but are of limited use in predicting the behavior of complex biological networks. We present an approach to building computational models directly from natural language using automated assembly. Molecular mechanisms described in simple English are read by natural language processing algorithms, converted into an intermediate representation, and assembled into executable or network models. We have implemented this approach in the Integrated Network and Dynamical Reasoning Assembler (INDRA), which draws on existing natural language processing systems as well as pathway information in Pathway Commons and other online resources. We demonstrate the use of INDRA and natural language to model three biological processes of increasing scope: (i) p53 dynamics in response to DNA damage, (ii) adaptive drug resistance in BRAF‐V600E‐mutant melanomas, and (iii) the RAS signaling pathway. The use of natural language makes the task of developing a model more efficient and it increases model transparency, thereby promoting collaboration with the broader biology community.
Word models (natural language descriptions of molecular mechanisms) are a common currency in spoken and written communication in biomedicine but are of limited use in predicting the behavior of complex biological networks. We present an approach to building computational models directly from natural language using automated assembly. Molecular mechanisms described in simple English are read by natural language processing algorithms, converted into an intermediate representation and assembled into executable or network models. We have implemented this approach in the Integrated Network and Dynamical Reasoning Assembler (INDRA), which draws on existing natural language processing systems as well as pathway information in Pathway Commons and other online resources.We demonstrate the use of INDRA and natural language to model three biological processes of increasing scope: (i) p53 dynamics in response to DNA damage; (ii) adaptive drug resistance in BRAF-V600E mutant melanomas; and (iii) the RAS signaling pathway. The use of natural language for modeling makes routine tasks more efficient for modeling practitioners and increases the accessibility and transparency of models for the broader biology community. Keywords: computational modeling, natural language processing, signaling pathwaysRunning title: From word models to executable models Standfirst text: INDRA uses natural language processing systems to read descriptions of molecular mechanisms and assembles them into executable models. Highlights:• INDRA decouples the curation of knowledge as word models from model implementation • INDRA is connected to multiple natural language processing systems and can draw on information from curated databases • INDRA can assemble dynamical models in rule-based and reaction network formalisms, as well as Boolean networks and visualization formats • We used INDRA to build models of p53 dynamics, resistance to targeted inhibitors of BRAF in melanoma, and the Ras signaling pathway from natural language . CC-BY-NC 4.0 International license peer-reviewed) is the author/funder. It is made available under a
Extracellular growth factors signal to transcription factors via a limited number of cytoplasmic kinase cascades. It remains unclear how such cascades encode ligand identities and concentrations. In this paper, we use live-cell imaging and statistical modeling to study FOXO3, a transcription factor regulating diverse aspects of cellular physiology that is under combinatorial control. We show that FOXO3 nuclear-to-cytosolic translocation has two temporally distinct phases varying in magnitude with growth factor identity and cell type. These phases comprise synchronous translocation soon after ligand addition followed by an extended back-and-forth shuttling; this shuttling is pulsatile and does not have a characteristic frequency, unlike a simple oscillator. Early and late dynamics are differentially regulated by Akt and ERK and have low mutual information, potentially allowing the two phases to encode different information. In cancer cells in which ERK and Akt are dysregulated by oncogenic mutation, the diversity of states is lower.
We need to effectively combine the knowledge from surging literature with complex datasets to propose mechanistic models of SARS-CoV-2 infection, improving data interpretation and predicting key targets of intervention. Here, we describe a large-scale community effort to build an open access, interoperable and computable repository of COVID-19 molecular mechanisms. The COVID-19 Disease Map (C19DMap) is a graphical, interactive representation of disease-relevant molecular mechanisms linking many knowledge sources. Notably, it is a computational resource for graph-based analyses and disease modelling. To this end, we established a framework of tools, platforms and guidelines necessary for a multifaceted community of biocurators, domain experts, bioinformaticians and computational biologists. The diagrams of the C19DMap, curated from the literature, are integrated with relevant interaction and text mining databases. We demonstrate the application of network analysis and modelling approaches by concrete examples to highlight new testable hypotheses. This framework helps to find signatures of SARS-CoV-2 predisposition, treatment response or prioritisation of drug candidates. Such an approach may help deal with new waves of COVID-19 or similar pandemics in the long-term perspective.
BackgroundFor automated reading of scientific publications to extract useful information about molecular mechanisms it is critical that genes, proteins and other entities be correctly associated with uniform identifiers, a process known as named entity linking or “grounding.” Correct grounding is essential for resolving relationships among mined information, curated interaction databases, and biological datasets. The accuracy of this process is largely dependent on the availability of machine-readable resources associating synonyms and abbreviations commonly found in biomedical literature with uniform identifiers.ResultsIn a task involving automated reading of ∼215,000 articles using the REACH event extraction software we found that grounding was disproportionately inaccurate for multi-protein families (e.g., “AKT”) and complexes with multiple subunits (e.g.“NF- κB”). To address this problem we constructed FamPlex, a manually curated resource defining protein families and complexes as they are commonly encountered in biomedical text. In FamPlex the gene-level constituents of families and complexes are defined in a flexible format allowing for multi-level, hierarchical membership. To create FamPlex, text strings corresponding to entities were identified empirically from literature and linked manually to uniform identifiers; these identifiers were also mapped to equivalent entries in multiple related databases. FamPlex also includes curated prefix and suffix patterns that improve named entity recognition and event extraction. Evaluation of REACH extractions on a test corpus of ∼54,000 articles showed that FamPlex significantly increased grounding accuracy for families and complexes (from 15 to 71%). The hierarchical organization of entities in FamPlex also made it possible to integrate otherwise unconnected mechanistic information across families, subfamilies, and individual proteins. Applications of FamPlex to the TRIPS/DRUM reading system and the Biocreative VI Bioentity Normalization Task dataset demonstrated the utility of FamPlex in other settings.ConclusionFamPlex is an effective resource for improving named entity recognition, grounding, and relationship resolution in automated reading of biomedical text. The content in FamPlex is available in both tabular and Open Biomedical Ontology formats at https://github.com/sorgerlab/famplex under the Creative Commons CC0 license and has been integrated into the TRIPS/DRUM and REACH reading systems.
Hybrid systems whose mode dynamics are governed by nonlinear ordinary differential equations (ODEs) are often a natural model for biological processes. However such models are difficult to analyze. To address this, we develop a probabilistic analysis method by approximating the mode transitions as stochastic events. We assume that the probability of making a mode transition is proportional to the measure of the set of pairs of time points and value states at which the mode transition is enabled. To ensure a sound mathematical basis, we impose a natural continuity property on the non-linear ODEs. We also assume that the states of the system are observed at discrete time points but that the mode transitions may take place at any time between two successive discrete time points. This leads to a discrete time Markov chain as a probabilistic approximation of the hybrid system. We then show that for BLTL (bounded linear time temporal logic) specifications the hybrid system meets a specification iff its Markov chain approximation meets the same specification with probability 1. Based on this, we formulate a sequential hypothesis testing procedure for verifying -approximatelythat the Markov chain meets a BLTL specification with high probability. Our case studies on cardiac cell dynamics and the circadian rhythm indicate that our scheme can be applied in a number of realistic settings.
Background: For automated reading of scientific publications to extract useful information about molecular mechanisms it is critical that genes, proteins and other entities be correctly associated with uniform identifiers, a process known as named entity linking or "grounding." Correct entity identification is essential for resolving relationships between mined information, curated interaction databases, and biological datasets. The accuracy of this process is largely dependent on the availability of resources allowing computers to link commonly-used abbreviations and synonyms found in literature to uniform identifiers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.