How to run transFold?

Nothing more simple! (At least for the basic submission form) Follow the 3 steps below:

Advanced submission form is a little bit more complex and you will probably need to read the transFold paper (see references) to feel more confortable with that. However, these parameters are quite natural. Transmembrane β-barrel structures can be constrained by setting several biologically motivated parameters defined below:

  1. number of TM β-strands in the barrel,
  2. length of TM β-strands,
  3. strand inclination with respect to membrane plane (shear number),
  4. size of periplasmic and extra-cellular loops,
  5. hydrophobic profile of TM β-strands.

In Addition, the user can upload his own contact potential tables and thus use his own energy model for the folding.

How to read transFold's predictions?

Four types of predictions are made by transFold:

Two kinds of output are available for these predictions. We describe these below.

Standard output:

The prediction is displayed in three lines. The first line contains the amino acid sequence input by the user (since transFold applies a lexer, all characters other than valid IUPAC single letter amino acid codes are removed). The second, third and fourth lines contain structure predictions made by transFold: secondary structure, topology and inter-residue contacts. Obviously, each of these lines describes the same secondary structure! Residues denoted as E, C or M are predicted to belong to a transmembrane β-strand (C denotes "cavity", i.e. facing the cavity of the β-barrel protein, while M denotes "membrane", i.e. facing the outer membrane bilayer). All other positions are predicted to be loop positions -- either in the extra-cellular loop or the periplasmic loop region.

The topology prediction (i.e. the orientation of the transmembrane helices through the membrane) is given by the notation used for the amino acids located in turn regions. Residues which are predicted to be inside the periplasm milieu are marked with i, and those exposed to the extra-cellular environment are denoted with o. The label E is used to mark strand extensions.

Residue-contacts are denoted by paired residues of the same type (C or M) between second and third lines. Pairings are articulated around a turn (denoted i or o). The first residue on the left of the third line is paired the first residue on the right of the second line. The second residue on the left of the third line is paired with the second residue on the right of the second line. and so on. Notation is inverted for the closing pair. The figure below illustrates this situation.

Finally, the folding pseudo-energy is computed as the sum of the contact potentials, as explained in the article:

File output:

A file summarizing the prediction in a more traditional format can be dowloaded as a 5-column, tab delimitated, text file. Each row of this file corresponds to a residue. The first column contains the index of the current amino acid; the second column contains the single letter amino acid code associated with this residue (all non-IUPAC single letter amino acid codes are stripped). The third column contains the secondary structure residue assignment as well as the side-chain orientation of TM β-strand residues. Hence, a residue marked as M or C is predicted to belong to a TM β-strand, where C denotes “channel” (i.e. facing the cavity of the β-barrel protein) and M denotes “membrane” (i.e. facing the outer membrane bilayer). A residue marked as i is predicted to be in the periplasm, a residue marked as o to be extra-cellular, and a residue marked as “.” to be exterior to the membrane, but not in a turn. The last two columns are only used for TM β-strand residues and give the index of the amino acids interacting with the current one (including the interaction of the closing strand pairing).

Which format is used for hydrophobicity scale and contact potential tables to upload? (advanced submission form)

Hydrophobicity scales use the most natural 2-column format. Each row corresponds to a particular amino acid. The first column should contain the single-letter code of the given amino acid, and the second field indicate its hydrophobicity value. The following example illustrates this:

A 2.18
R -4.07
...

The contact potential tables follow a similar 2-column format. Each row corresponds to a given pairwise interaction. The first field contains a 2-letter code of the amino acids brought into contact. The first letter is the residue of the second strand, the second letter is the residue of the strand we're given. (This is to be consistent with the P(V|A) = probability of V given A notation) The second field is the value associated to this orientated interaction. Here is an example:

AA -2.84366275993
AC -2.8903717579
...

Please, always use upper-case characters for amino acids single-letter code and avoid duplicate.