2015
DOI: 10.1016/j.ymeth.2014.11.020
|View full text |Cite
|
Sign up to set email alerts
|

DISEASES: Text mining and data integration of disease–gene associations

Abstract: Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated assoc… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

5
457
0
4

Year Published

2015
2015
2023
2023

Publication Types

Select...
9
1

Relationship

1
9

Authors

Journals

citations
Cited by 508 publications
(466 citation statements)
references
References 48 publications
5
457
0
4
Order By: Relevance
“…To this end, we used a single, monotonic calibration function for all datasets from all organisms, which we calibrated on the text-mining results for human gene–tissue associations (Supplementary Figure S3). We chose to use this particular set of associations, because it is large, facilitating robust results, and because text-mining is also available for other types of associations, allowing unified confidence scores across TISSUES and the related databases COMPARTMENTS (27) and DISEASES (49). The calibrated functions used for transforming raw expression values into final confidence star scores are available in Supplementary Table S4.…”
Section: Methodsmentioning
confidence: 99%
“…To this end, we used a single, monotonic calibration function for all datasets from all organisms, which we calibrated on the text-mining results for human gene–tissue associations (Supplementary Figure S3). We chose to use this particular set of associations, because it is large, facilitating robust results, and because text-mining is also available for other types of associations, allowing unified confidence scores across TISSUES and the related databases COMPARTMENTS (27) and DISEASES (49). The calibrated functions used for transforming raw expression values into final confidence star scores are available in Supplementary Table S4.…”
Section: Methodsmentioning
confidence: 99%
“…Because prior depression is a risk factor for PPD (Miller, 2002) and depression per se is much more studied that PPD, many depression genes could be relevant to PPD. We also found enrichment with PPD genes that were produced by combining two small lists from DISEASES (Copenhagen) (Pletscher-Frankild et al, 2015) and Malacards (Rappaport et al, 2013). By combining matches of top 700 maternal genes with either depression (2 or more databases) or PPD, we identify a subset of genes that may be new high priority PPD genes (Fig.…”
Section: Depression and Postpartum Depressionmentioning
confidence: 70%
“…Counted at the level of individual mentions, the SPECIES and ENVIRONMENTS taggers showed precision of 83.9 and 87.8%, recall of 72.6 and 77.0%, and F1 scores of 78.8 and 82.0%, respectively. The quality of the NER of tissues and diseases has not been benchmarked directly; however, these NER components have shown to give good results when used for co-mentioning-based extraction of protein–tissue and protein–disease associations (31, 32). In terms of perception metrics, the evaluators generally found the system to be intuitive, useful, well documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms (Figure 3 and Table 7).…”
Section: Resultsmentioning
confidence: 99%