How to run DiANNA?
|
DiANNA provides two services:
- Cysteine classification
- Disulfide connectivity prediction
|
Both services requires only an input protein sequence, which must be pasted
in a text box. The sequence must be in FASTA format (click here for an
example).
|
Output interpretation
|
Cysteine classification:
The user can choose a ternary prediction (i.e. for each cysteine in the
sequence we predict whether the cysteine is half-cystine, free cysteine or
ligand-bound) or one of the three binary
predictions (half-cystine vs. free cysteine, ligand-bound vs. half-cystine,
ligand-bound vs. free cysteine).
In the case of the ternary classification, for each cysteine, we report the probability
of being ligand-bound, half-cystine or
free cysteine, as computed by a three-class support vector machine. Then,
for each cysteine predicted as ligand-bound, we predict to which atom-type
it may be bound, out of four possible ligands (Fe, Cd, C, Zn) using a
winner-takes-all decision (i.e. four different support vector machines, each
one trained to recognize cysteines bonded to a specific ligand, are tested,
and we assign the prediction to the one that produces the maximum score).
Similarly, in the case of binary classification, we report the probability of
being in one of two mutually exclusive states.
For more details about the method employed and the results of binary
classification experiments, have a look at the web supplement.
|
Disulfide connectivity
For each pair of cysteine in the input sequence, a neural network
trained to recognize disulfide bonds produce a score ranging from 0 to 1
(higher the score, higher the prediction reliability). This scores are used
to obtain the final prediction using a maximum weight matching. As a
consequence, bonds which have a high score may or may not be in the final
prediction. For example, consider a protein having 4 cysteines A, B, C and
D. Let's assume that the neural network predicts the following scores:
A | B | 0.9 |
A | C | 0.1 |
A | D | 0.8 |
B | C | 0.8 |
B | D | 0.1 |
C | D | 0.5 |
Here, the bond A-B has the maximum score (0.9). Nevertheless, if you
consider the bond A-B correct, then you have only one choice for the second
bond, i.e. C-D (score 0.5). The pair of bonds that maximized the sum of the
scores is instead A-D (0.8) and C-B (0.8). Therefore, in this case the bond
with max score is not in the optimal solution.
|
|