An accurate feature‐based method for identifying DNA‐binding residues on protein surfaces

Xiong, Yi; Liu, Juan; Wei, Dong‐Qing

doi:10.1002/prot.22898

Cited by 67 publications

(64 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our previous work [7] also confirms the role of physicochemical properties in characterizing DNAbinding residues. However, to the best of our knowledge, no related work has incorporated physicochemical and biological properties from the Amino Acid Index (AAindex) database [8] to analyze and predict heme binding residues.…”

Section: Introductionsupporting

confidence: 62%

Prediction of Heme Binding Sites in Heme Proteins Using an Integrative Sequence Profile Coupling Evolutionary Information with Physicochemical Properties

Xiong

Zhang

Zeng

et al. 2011

2011 IEEE International Conference on Bioinformatics and Biomedicine

Self Cite

View full text Add to dashboard Cite

Heme-protein interactions are essential for various biological processes such as electron transfer, catalysis, signal transduction and the control of gene expression. The knowledge of heme binding residues can provide crucial clues to understand the mechanism of hemeprotein interactions and aid in functional annotation. In the present work, we propose a sequence-based approach for the accurate prediction of heme binding residues by a novel integrative sequence profile coupling position specific scoring matrices with heme specific physicochemical properties. Particularly, we design an intuitive feature selection scheme for informative physicochemical properties. As shown in the primary results, our integrative sequence profile approach for prediction of heme binding residues outperforms the conventional methods using amino acid and evolutionary information on the 5-fold cross validation and the independent test.

show abstract

Section: Introductionsupporting

confidence: 62%

Prediction of Heme Binding Sites in Heme Proteins Using an Integrative Sequence Profile Coupling Evolutionary Information with Physicochemical Properties

Xiong

Zhang

Zeng

et al. 2011

2011 IEEE International Conference on Bioinformatics and Biomedicine

Self Cite

View full text Add to dashboard Cite

show abstract

“…These examples were derived from the articles of both Ozbek and Xiong (11,19). Ozbek compiled 54 pairs of structures, which included both protein–DNA complexes (HOLO) and unbound proteins (APO).…”

Section: Methodsmentioning

confidence: 99%

“…Some of these are based on the primary sequence of a protein (4,7,9,10,12,15–18,21), whereas others are built using structure-based information (1,2,5,6,8,11,13,14,19,20,22). Machine-learning methods such as support vector machine (SVM) classifiers (15,19), neural networks (1,13) and random forest-based approaches (16,18) have been used for training feature-based models to identify DNA-binding sites.…”

Section: Introductionmentioning

confidence: 99%

DBSI: DNA-binding site identifier

Zhu

Ericksen

Mitchell

2013

Nucleic Acids Research

View full text Add to dashboard Cite

In this study, we present the DNA-Binding Site Identifier (DBSI), a new structure-based method for predicting protein interaction sites for DNA binding. DBSI was trained and validated on a data set of 263 proteins (TRAIN-263), tested on an independent set of protein-DNA complexes (TEST-206) and data sets of 29 unbound (APO-29) and 30 bound (HOLO-30) protein structures distinct from the training data. We computed 480 candidate features for identifying protein residues that bind DNA, including new features that capture the electrostatic microenvironment within shells near the protein surface. Our iterative feature selection process identified features important in other models, as well as features unique to the DBSI model, such as a banded electrostatic feature with spatial separation comparable with the canonical width of the DNA minor groove. Validations and comparisons with established methods using a range of performance metrics clearly demonstrate the predictive advantage of DBSI, and its comparable performance on unbound (APO-29) and bound (HOLO-30) conformations demonstrates robustness to binding-induced protein conformational changes. Finally, we offer our feature data table to others for integration into their own models or for testing improved feature selection and model training strategies based on DBSI.

show abstract

“…Thus each element of this matrix represents the probability of a type of amino acid to occur at a specific site, from which the residue conservation in a given protein could be mapped in detail. PSSM shows great power in many prediction studies such as protein-DNA interface residue (Xiong et al, 2011a;Xiong et al, 2011b), and transcription factor binding sites (Pairo et al, 2012). In this work, PSSM was applied to improve the models trained from the four base combinations into those in the second category.…”

Section: Improving the Performance Of Snp Prediction By Features Basementioning

confidence: 99%

Improved feature-based prediction of SNPs in human cytochrome P450 enzymes

Xiong

Zhang

et al. 2015

Interdiscip Sci Comput Life Sci

Self Cite

View full text Add to dashboard Cite

Single nucleotide polymorphisms (SNPs) make up the most common form of mutations in human cytochrome P450 enzymes family, and have the potential to bring with different drug responses or specific diseases in individual patients. Here, based on machine learning technology, we aim to explore an effective set of sequence-based features for improving prediction of SNPs by using support vector machine algorithms. The features are derived from the target residues and flanking protein sequences, such as amino acid types, sequences composition, physicochemical properties, position-specific scoring matrix, phylogenetic entropy and the number of possible codons of target residues. In order to deal with the imbalance data with a majority of non-SNPs and a minority of SNPs, a preprocessing strategy based on fuzzy set theory was applied to the datasets. Our final model achieves the performance of 93.8% in sensitivity, 88.8% in specificity, 91.3% in accuracy and 0.971 of AUC value, which is significantly higher than the previous DNA sequence-based or protein sequence-based methods. Furthermore, our study also suggested the roles of individual features for prediction of SNPs. The most important features consist of the amino acid type, the number of available codons, position-specific scoring matrix and phylogenetic entropy. The improved model will be a promising tool for SNP predictions, and assist in the research of genome mutation and personalized prescriptions.

show abstract

An accurate feature‐based method for identifying DNA‐binding residues on protein surfaces

Cited by 67 publications

References 35 publications

Prediction of Heme Binding Sites in Heme Proteins Using an Integrative Sequence Profile Coupling Evolutionary Information with Physicochemical Properties

Prediction of Heme Binding Sites in Heme Proteins Using an Integrative Sequence Profile Coupling Evolutionary Information with Physicochemical Properties

DBSI: DNA-binding site identifier

Improved feature-based prediction of SNPs in human cytochrome P450 enzymes

Contact Info

Product

Resources

About