DiANNA (DiAminoacid Neural Network Application)
is a web server that provides two services:
- cysteine classification prediction
- disulfide connectivity prediction
DiANNA 1.1 determines the cysteine species (free cysteine,
half-cystine or ligand-bound) by using a support vector machine (SVM)
with degree 2 polynomial kernel for the spectrum representation.
Additionally, if a cysteine is predicted to be ligand-bound, then
the most likely of the four most common ligands (iron, zinc, cadmium,
carbon) is proposed.
DiANNA 1.1 determines the disulfide connectivity is predicted using
a state-of-the-art method involving a novel architecture neural network.
By disulfide connectivity, we mean, for example, in the
case of four half-cystines, to determine that (1,2) and (3,4) are
the disulfide bonds, or that (1,3) and (2,4) are the disulfide bonds, etc.
- Cysteine classification prediction
A ternary classification is attempted by applying a support vector machine
(SVM) with spectrum kernel to determine whether a cysteine is reduced (free in sulfhydryl state),
half-cystine (involved in a disulfide bond) or bound to a metallic ligand.
In the latter case, DiANNA predicts the ligand among
iron, zinc, cadmium and carbon.
The SVMs are trained and tested on a non-redundant list of proteins in which
each of the three classes is well represented (the complete list is available here).
To apply SVMs to the ternary
cysteine classification
problem, we must encode amino acid sequences (contents of size w
windows) into vectors with real coordinates. To that end, we use
the spectrum representation by Leslie et al., which proved more effective than
the more sophisticated mismatch (Leslie et al.) and profile (Kuang et al.) representation for amino acid sequences.
We then use libSVM with a polynomial kernel (degree 2) to train a
ternary predictor. A similar approach is used for binary classification for
each pair of cysteine classes, i.e. ligand-bound vs. half-cystines,
ligand-bound vs. free cysteines, half-cystines vs. free cysteines. All three
binary classifiers are available in addition to the ternary classifier.
- Disulfide bonds prediction
A diresidue Neural Network (Ferre and Clote) is trained to recognize pairs of bonded
half-cystines given input of half-cystines symmetric flanking regions.
The network is trained using disulfide bonds information derived from high-quality protein structures
(the complete list is available here,
and is derived from Vullo and
Frasconi and Fariselli et al
.papers). The neural network input includes evolutionary as well
as secondary structure information.
Given two size w windows centered at an N- resp. C-terminus putative
half-cystine, we run PSIPRED on the whole input sequence to predict the
secondary structure (helix, coil, sheet) of each of the 2w residues,
then we use the PsiBlast run performed by PSIPRED to produce the profile of
each position 1 ≤ i ≤ 2w.
Trained and tested on disulfide bonds extracted from a list of proteins
having at most five and at lest two bonds, the software achieves
81% accuracy and 43% Matthews' correlation
coefficient (see Ferre and Clote).
The connectivity prediction (i.e. the prediction of disufide bond partners) is obtained by
the Ed Rothberg's implementation of the Edmonds-Gabow maximum weight
matching algorithm (wmatch). This algorithm is applied to the graph
whose nodes are the putative half-cystines and whose edges are pairs of
half-cystines weighted by the diresidue neural network in the
disulfide bonds prediction module.
After training and testing on a list of proteins
having at most five and at lest two bonds, the connectivity prediction achieves a rate
Qp of 49% for perfect predictions (i.e.
the fraction of proteins for which there are no false positive or false
negative predictions made), 86% accuracy and 51% Matthews' correlation
coefficient (see Ferre and Clote).
References:
Fariselli P, Riccobelli P, Casadio R. (1999).
Role of evolutionary information in predicting the disulfide-bonding state
of cysteine in proteins. Proteins 36(3):340-6 Pubmed
Fariselli P, Casadio R. (2001). Prediction of disulfide connectivity
in proteins. Bioinformatics 17(10):957-64.Pubmed
Ferre F, Clote P. (2005).
Disulfide connectivity prediction using
secondary structure information and diresidue
frequencies. Bioinformatics. Pubmed
Kuang R, Ie E, Wang K, Wang K, Siddiqi M, Freund Y, Leslie C (2004).
Profile-based string kernels for remote homology detection and motif
extraction.
Proc IEEE Comput Syst Bioinform Conf..
PubMed .
Jones DT. (1999). Protein secondary structure
prediction based on position-specific scoring matrices. J Mol Biol.
292(2):195-202.PubMed Web Site
Leslie C, Eskin E, Noble WS (2002).
The spectrum kernel: a string kernel for SVM protein classification.
Pac Symp Biocomput. PubMed
Leslie CS, Eskin E, Cohen A, Weston J, Noble WS (2004).
Mismatch string kernels for discriminative protein classification.
Bioinformatics 20(4):467-76.
PubMed
libSVM. Web Site.
Ed Rothberg's wmatch. Web Site
Vullo A, Frasconi P. (2004). Disulfide connectivity prediction using
recursive neural networks and evolutionary information. Bioinformatics
20(5):653-9.
PubMed
|