Welcome to the Clote Lab Inverse Folding website !



The RNA inverse folding problem is the problem, given a target secondary structure in dot bracket notation, of determining one or more RNA sequences, whose minimum free energy (MFE) structure is the target structure. Here, the MFE structure is computed using RNAfold from the Vienna RNA Package. In addition, the user may provide sequence constraints, stipulating that certain positions be occupied by specific nucleotides, or that (for instance) the solution sequence has a GC-content within a certain user-specified range. This website provides access to two algorithms for the inverse folding problem:

    RNA-CPdesign

  • RNA-CPdesign. Given a target structure and optional sequence constraints, CPdesign uses Constraint Programming (CP) to determine one or more RNA sequences that fold into the given target structure. CP performs a complete exploration of the search space, and, thus can also prove that no sequence folds into the target structure exists. Since computation time may be exhorbitant, the latter is only feasible for sufficiently small structures.

  • RNA-LNSdesign

  • RNA-LNSdesign. Given a target structure and optional sequence constraints, LNSdesign uses Large Neighborhood Search (LNS) to determine one or more RNA sequences that fold into the target structure. LNS is a heuristic, that calls CP as a subroutine, which is suitable for larger structures. Since LNSsearch is a heuristic algorithm, it cannot prove the nonexistence of a solution to an inverse folding problem.

Here is an example of RNA Inverse Folding software using a known RNA:

This is the minimum free energy secondary structure of an Oryza Sativa RNA, proposed as inverse folding problem "Oryza Sativa 4" on the EteRNA web site.

Target sequence must have as most 40 GC pairs and at least 10 GU pairs.

((((((((.(((((.((((((((((((((((((((((((.(((((((((((((.((((( (((((................))))))))))..(((((((((((((......))))))) ))))))))))))))))))).))))))))))))))))))))))))))))).))))))))

Using RNA-CPdesign, within 330 ms, the following sequence was found to fold into the given target structure:

AUUAAUAAAGUUGAUGGUGAGAGGAUAGUUAGAUAGGGGAGGGGGGGGGGGGGAGGGGG GGGGCAAAAAAAAAAAAAAAAGCCCCCCCCCAAGGGGGGGCGGGGCAAAAAAGCCCCGC CCCCCCCCCCCCCCCCCCCACCCCUAUUUAAUUAUUUUUUUAUUUUAAUAUUAUUAAU

Graphical representation of the minimum free energy (MFE) secondary structure using VARNA.

Both servers include a co-folding option: given a hybridization structure, represented in dot-bracket notation with an ampersand sign '&' between the first and second hybridized portions, the servers can determine two RNA sequences, separated by an ampersand sign '&', whose MFE hybridization is the input structure. Here the MFE hybridization is computed using RNAcofold from the Vienna RNA Package. For instance, the MFE hybridization of the sequences 5'-GGGGGAACCCCGGGGGGGGG-3' and 5'-CCCCCCCCCC-3', represented by concatenated sequences with separating ampersand, 'GGGGGAACCCCGGGGGGGGG&CCCCCCCCCC', is

(((....))).(((((((((&))))))))).

corresponding to the hybridization

5'-GGGGGAACCCCGGGGGGGGG-3' 
   (((....))).|||||||||
           3'-CCCCCCCCCC-5' 
						

which includes intra-molecular structure of the first sequence, along with hybridization between first and second sequences. Using CPdesign, with input

(((....))).(((((((((&))))))))).

we obtain the following sequences, separated by ampersand, whose MFE hybridization is the target structure:

GGCAAAAGCCAGGCGCGGGC&GCCCGCGCCA

Modena structures


Modena tarball (
modena.tgz)
RF00001.121  RF00002.2  RF00003.94  RF00004.126  RF00005.1  RF00006.1
RF00007.20  RF00008.11  RF00009.115  RF00010.253  RF00011.18  RF00012.15
RF00013.139  RF00014.2  RF00015.101  RF00016.15  RF00017.90  RF00018.2
RF00019.115  RF00020.107  RF00021.10  RF00022.1  RF00024.16  RF00025.12
RF00026.1  RF00027.7  RF00028.1  RF00029.107  RF00030.30


RNA-SSD structures


RNA-SSD 1 tarball (rna-ssd1.tgz)
Z83250  L11935  LIU92530  U84629  AF107506  AF106618
AJ011149  S70838  U63350  AF141485  U81771  AJ130779
AF096836  X61771  AJ236455  AJ132572  AB015827  D38777
AF029195  X81949  AJ133622  AF056938  X99676  L77117


RNA-SSD Test Set C


RNA-SSD 2 tarball (rna-ssd2.tgz)
Minimal catalytic domains of the hairpin ribozyme satellite RNA of the tobacco ringspot virus (Figure 1a) (Fedor, 2000)
U3 snoRNA 5'-domain from Chlamydomonas reinhardtii, in vivo probing (Figure 6B) (Antal et al., 2000)
H. marismortui 5S rRNA (Figure 2) (Szymanski et al., 2002)
VS Ribozyme from Neurospora mitochondria (Figure 1A) (Lafontaine et al., 2001)
R180 ribozyme (Figure 2B) (Sun et al., 2002)
XS1 ribozyme, Bacillus subtilis P RNA-based ribozyme (Figure 2A) (Mobley and Pan, 1999)
Homo Sapiens RNase P RNA (Figure 4) (Pitulle et al., 1998)
S20 mRNA from E.coli (Figure 2) (Mackie, 1992)
Halobacterium cutirubrum RNAse P RNA (Figure 2) (Haas et al., 1990)
Group II intron ribozyme D135 from ai5g (Figure 5) (Swisher et al., 2001)


EteRNA sequences


EteRNA tarball (EteRNA.tgz)
Prion Pseudoknot  Human astrovirus  Homo Sapiens 1 Series
HIV Primer Binding Site  Homo Sapiens 3  Other Ribosomal RNA  Bacilus Subtilis sRNA
5s Ribosomal RNA  Tribolium Castaneum  Oryza sativa 4  Symbiotic plasmid  Telomerase RNA

Downloads



Source code is available for download.

Software is developed in COMET, which is required for running all RNAiFold programs.
Modified Vienna Package libraries are compiled for Linux-i686. Libraries for Linux-amd64 are also included (see instructions).

References



If you use RNAiFold for your research, would you please cite the following:

  1. Juan Antonio Garcia-Martin, Peter Clote, Ivan Dotu.
    RNAiFold: A constraint programming algorithm for RNA inverse folding and molecular design.
    J Bioinform Comput Biol 11(2): 1350001, 2013.

For comparison, there are related publications by other groups:

  • RNAinverse software: I. Hofacker, W. Fontana, P. Stadler, L. Bonhoeffer, M. Tacker and P. Schuster. Fast folding and comparison of RNA Secondary structures. Monatsh Chem. 125:167-188 (1994).
  • RNA-SSD software: M. Andronescu, AP. Fejes, F. Hutter, HH. Hoos and A. Condon. A new algorithm for RNA secondary structure design. J Mol Biol. 336: 607-624 (2004).
  • InfoRNA software: A. Busch and R. Backofen. INFO-RNA - fast approach to inverse RNA folding. Bioinformatics 22 15:1823-1831 (2006).
  • Modena software: A. Taneda. MODENA: a multi-objective RNA inverse folding. Advances and Applications in Bioinformatics and Chemistry 4:1-12 (2011).
  • NUPACK software: J.N. Zadeh, B.R. Wolfe, N.A. Pierce. Nucleic acid sequence design via efficient ensemble defect optimization. J Comput Chem, 32, 439–452, (2011)