To the best of my knowledge, the partition function algorithm for an alignment was first introduced in 1995 by S. Miyazawa, in . The same algorithm was re-discovered and employed in 2002 by Mückstein, Hofacker and Stadler in "Stochastic Pairwise Alignments" in order to produce stochastic near-optimal alignments. The same algorithm was re-discovered by P. Clote in 2003, who computed pair probabilities of aligned residues in the unpublished manuscript "Biologically significant sequence alignments using Boltzmann probabilities", and later published related algorithms for time warping Symmetric time warping, Boltzmann pair probabilities and functional genomics, and BTW: a web server for Boltzmann time warping of gene expression time series .
Web engine for a dynamic programming algorithm to compute the forward and backward partition function, which thus allows the computation of Boltzmann probability that two characters (amino acids or nucleotides) are aligned. This probability gives a measure of how often the designated two characters are aligned over all pairwise alignments, where an alignment A is weighted by exp(score(A)), where score is the PAM250, Blosum62, etc. score of the alignment. Using the Boltzmann probabilities we thus have a natural and efficiently computable notion of biological significance to the alignment of two characters. This approach can be confirmed using FSSP, 3d_ali and other biologically significant structural alignments.
Our current implementation supports both Needleman-Wunsch global alignment and Smith-Waterman local alignment, both using linear and affine gap penalties. However, the web server currently only supports the Smith-Waterman local alignment with linear gap penalty. (A couple of checkboxes and simple modifications of the cgi parser will yield full functionality. The final web server will have a color indicator representing the strength of the probability.)
Our current implementation allows any similarity matrix, though we use the default of PAM250 amino acid similarity matrix. Again, the final web server will incorporate several choices of similarity matrices.
For large sequences, such as two well-known yeast proteins Cdc25 and Ste6, due to floating point overflow caused by the addition of numbers exponential in alignment scores, we plan to use a centered window of default size (say) 50 to compute the Boltzmann probabilities on the alignment appearing withing the centered window. Currently this is not implemented, and probabilities are computed on the entire (local) alignment.
WARNING: Current implementation will crash when there is NO local alignment between 2 sequences; for instance on AAA and IIII, since PAM250 similarity of A with I is negative. This is an easy thing to fix, by testing if the alignment is empty, and will be rectified in the final web server.
Our implementation requires 4 command line arguments: peptide1, peptide2, similarityMatrix, gapPenalty. Here, peptide1 and peptide2 are FASTA format files for 2 amino acid sequences; alternatively, short sequences can be pasted into the form. The file similarityMatrix is in the format, as in PAM250 matrix, and you may give an arbitrary gap penalty (e.g. -10 for strong gap penalty, 0 for no gap penalty). With these parameters, click here to see the output for alignment on these 2 files. To see the cgi code, click here. With this output, one can then use gnuplot for graphical alignment output.
Insert 2 amino acid sequences amino acid sequences AND a gap penalty (e.g. -10 for strong gap penalty, and 0 for no gap penalty).