2018
DOI: 10.1186/s12859-018-2211-5
|View full text |Cite
|
Sign up to set email alerts
|

FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining

Abstract: BackgroundFor automated reading of scientific publications to extract useful information about molecular mechanisms it is critical that genes, proteins and other entities be correctly associated with uniform identifiers, a process known as named entity linking or “grounding.” Correct grounding is essential for resolving relationships among mined information, curated interaction databases, and biological datasets. The accuracy of this process is largely dependent on the availability of machine-readable resource… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
43
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
2
1

Relationship

3
4

Authors

Journals

citations
Cited by 31 publications
(44 citation statements)
references
References 30 publications
(32 reference statements)
1
43
0
Order By: Relevance
“…The GeneWalk applications in this study used the INDRA 15,16 and Pathway Commons 14 knowledge bases which enable automated assembly of a GeneWalk network. Although these databases are optimized for human genes, we show that when mouse genes can be mapped unambiguously to their human orthologues, a network can still be assembled.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…The GeneWalk applications in this study used the INDRA 15,16 and Pathway Commons 14 knowledge bases which enable automated assembly of a GeneWalk network. Although these databases are optimized for human genes, we show that when mouse genes can be mapped unambiguously to their human orthologues, a network can still be assembled.…”
Section: Discussionmentioning
confidence: 99%
“…Here, represent the input weights of w I , which constitute the vector representations used for our GeneWalk analysis, and the output weights for w O . For the vector dimensionality d, we tested different values (2,3,4,6,8,12,16,32,50,500), and found that d=8 was optimal because the variance of the resulting cosine similarity distributions was largest, indicating the highest sensitivity of detection of similarity between node pairs. Lower dimensionality generally resulted in high similarity between all nodes, whereas higher dimensionality lowered all similarity values; both cases resulted in a reduced variability.…”
Section: Network Representation Learning Using Random Walksmentioning
confidence: 99%
See 2 more Smart Citations
“…Text mining extracts information not only about kinases that directly phosphorylate a target site, but also kinases that lie further upstream as well as non-kinase regulators (e.g., growth factors). REACH and Sparser also use FamPlex identifiers, a taxonomy of protein families and complexes for text mining, to extract and normalize phosphorylation events expressed in terms of kinase classes, e.g., "ERK", "AKT", "AMPK" [20]. For an equal comparison between text mining and databases, we therefore repeated the comparison by restricting the regulators to specific human kinase proteins and found that text mining still yielded a substantial body of new information, with 3,792 unique human kinase-site pairs reported by machine readers and not PhosphoSitePlus, and 3,118 that did not appear in any curated database ( Fig 3C).…”
Section: Text Mining Tools Identify Many Uncurated Regulators Of Phosmentioning
confidence: 99%