Welcome to the Clote Lab Inverse Folding website !
The RNA inverse folding problem is the problem, given a target secondary structure in dot bracket notation, of determining one or more RNA sequences, whose minimum free energy (MFE) structure is the target structure. Here, the MFE structure is computed using RNAfold from the Vienna RNA Package. In addition, the user may provide sequence constraints, stipulating that certain positions be occupied by specific nucleotides, or that (for instance) the solution sequence has a GC-content within a certain user-specified range. This website provides access to two algorithms for the inverse folding problem:
RNA-CPdesign. Given a target structure and optional sequence constraints,
CPdesign uses Constraint Programming (CP) to determine
one or more RNA sequences that fold into the given target structure.
CP performs a complete exploration of the search space, and, thus can
also prove that no sequence folds into the target structure exists.
Since computation time may be exhorbitant, the latter is only feasible
for sufficiently small structures.
Given a target structure and optional sequence constraints, LNSdesign
uses Large Neighborhood Search (LNS) to determine
one or more RNA sequences that fold into the target structure.
LNS is a heuristic, that calls CP as a subroutine, which is
suitable for larger structures. Since LNSsearch is a heuristic algorithm,
it cannot prove the nonexistence of a solution to an inverse folding problem.
Here is an example of RNA Inverse Folding software using a known RNA:
This is the minimum free energy secondary structure of an Oryza Sativa RNA, proposed as inverse folding problem "Oryza Sativa 4" on the EteRNA web site.
Target sequence must have as most 40 GC pairs and at least 10 GU pairs.
((((((((.(((((.((((((((((((((((((((((((.(((((((((((((.((((( (((((................))))))))))..(((((((((((((......))))))) ))))))))))))))))))).))))))))))))))))))))))))))))).))))))))
Using RNA-CPdesign, within 330 ms, the following sequence was found to fold into the given target structure:
AUUAAUAAAGUUGAUGGUGAGAGGAUAGUUAGAUAGGGGAGGGGGGGGGGGGGAGGGGG GGGGCAAAAAAAAAAAAAAAAGCCCCCCCCCAAGGGGGGGCGGGGCAAAAAAGCCCCGC CCCCCCCCCCCCCCCCCCCACCCCUAUUUAAUUAUUUUUUUAUUUUAAUAUUAUUAAU
Graphical representation of the minimum free energy (MFE) secondary structure using VARNA.
Both servers include a co-folding option: given a hybridization structure, represented in dot-bracket notation with an ampersand sign '&' between the first and second hybridized portions, the servers can determine two RNA sequences, separated by an ampersand sign '&', whose MFE hybridization is the input structure. Here the MFE hybridization is computed using RNAcofold from the Vienna RNA Package. For instance, the MFE hybridization of the sequences 5'-GGGGGAACCCCGGGGGGGGG-3' and 5'-CCCCCCCCCC-3', represented by concatenated sequences with separating ampersand, 'GGGGGAACCCCGGGGGGGGG&CCCCCCCCCC', is
corresponding to the hybridization
5'-GGGGGAACCCCGGGGGGGGG-3' br> (((....))).||||||||| 3'-CCCCCCCCCC-5'
which includes intra-molecular structure of the first sequence, along with hybridization between first and second sequences. Using CPdesign, with input
we obtain the following sequences, separated by ampersand, whose MFE hybridization is the target structure:
Modena tarball (modena.tgz)
RF00001.121 RF00002.2 RF00003.94 RF00004.126 RF00005.1 RF00006.1
RF00007.20 RF00008.11 RF00009.115 RF00010.253 RF00011.18 RF00012.15
RF00013.139 RF00014.2 RF00015.101 RF00016.15 RF00017.90 RF00018.2
RF00019.115 RF00020.107 RF00021.10 RF00022.1 RF00024.16 RF00025.12
RF00026.1 RF00027.7 RF00028.1 RF00029.107 RF00030.30
RNA-SSD 1 tarball (rna-ssd1.tgz)
Z83250 L11935 LIU92530 U84629 AF107506 AF106618
AJ011149 S70838 U63350 AF141485 U81771 AJ130779
AF096836 X61771 AJ236455 AJ132572 AB015827 D38777
AF029195 X81949 AJ133622 AF056938 X99676 L77117
RNA-SSD Test Set C
RNA-SSD 2 tarball (rna-ssd2.tgz)
Minimal catalytic domains of the hairpin ribozyme satellite RNA of the tobacco ringspot virus (Figure 1a) (Fedor, 2000)
U3 snoRNA 5'-domain from Chlamydomonas reinhardtii, in vivo probing (Figure 6B) (Antal et al., 2000)
H. marismortui 5S rRNA (Figure 2) (Szymanski et al., 2002)
VS Ribozyme from Neurospora mitochondria (Figure 1A) (Lafontaine et al., 2001)
R180 ribozyme (Figure 2B) (Sun et al., 2002)
XS1 ribozyme, Bacillus subtilis P RNA-based ribozyme (Figure 2A) (Mobley and Pan, 1999)
Homo Sapiens RNase P RNA (Figure 4) (Pitulle et al., 1998)
S20 mRNA from E.coli (Figure 2) (Mackie, 1992)
Halobacterium cutirubrum RNAse P RNA (Figure 2) (Haas et al., 1990)
Group II intron ribozyme D135 from ai5g (Figure 5) (Swisher et al., 2001)
EteRNA tarball (EteRNA.tgz)
Prion Pseudoknot Human astrovirus Homo Sapiens 1 Series
HIV Primer Binding Site Homo Sapiens 3 Other Ribosomal RNA Bacilus Subtilis sRNA
5s Ribosomal RNA Tribolium Castaneum Oryza sativa 4 Symbiotic plasmid Telomerase RNA
Source code is available for download.
Software is developed in COMET, which is required for running all RNAiFold programs.
Modified Vienna Package libraries are compiled for Linux-i686. Libraries for Linux-amd64 are also included (see instructions).
If you use RNAiFold for your research, would you please cite the following:
Juan Antonio Garcia-Martin, Peter Clote, Ivan Dotu.
RNAiFold: A constraint programming algorithm for RNA inverse folding and molecular design.
J Bioinform Comput Biol 11(2): 1350001, 2013.
For comparison, there are related publications by other groups:
- RNAinverse software: I. Hofacker, W. Fontana, P. Stadler, L. Bonhoeffer, M. Tacker and P. Schuster. Fast folding and comparison of RNA Secondary structures. Monatsh Chem. 125:167-188 (1994).
- RNA-SSD software: M. Andronescu, AP. Fejes, F. Hutter, HH. Hoos and A. Condon. A new algorithm for RNA secondary structure design. J Mol Biol. 336: 607-624 (2004).
- InfoRNA software: A. Busch and R. Backofen. INFO-RNA - fast approach to inverse RNA folding. Bioinformatics 22 15:1823-1831 (2006).
- Modena software: A. Taneda. MODENA: a multi-objective RNA inverse folding. Advances and Applications in Bioinformatics and Chemistry 4:1-12 (2011).
- NUPACK software: J.N. Zadeh, B.R. Wolfe, N.A. Pierce. Nucleic acid sequence design via efficient ensemble defect optimization. J Comput Chem, 32, 439–452, (2011)