Welcome to the Clote Lab Inverse Folding website !
The RNA inverse folding problem is the problem, given a target secondary structure in dot bracket notation, of determining one or more RNA sequences, whose minimum free energy (MFE) structure is the target structure. Here, the MFE structure is computed using RNAfold from the Vienna RNA Package. In addition, the user may provide sequence constraints, stipulating that certain positions be occupied by specific nucleotides, or that (for instance) the solution sequence has a GC-content within a certain user-specified range. This website provides access to two algorithms for the inverse folding problem:
RNA Synthetic design. Simple 3-step pipeline to design synthetic RNAs that fold into the consensus structure of a user-selected Rfam family. Sequence constaints are automatically generated for those positions, whose sequence conservation in the Rfam seed alignment exceeds a user-specified threshold -- since functionally important nucleotides (e.g. the active site) are known to be conserved, this step improves likelihood of designing a functionally active synthetic RNA. Additional constraints and structural compatibility/incompatibility requirements can be enforced.
RNA-CPdesign. Given a target structure and optional sequence constraints,
CPdesign uses Constraint Programming (CP) to determine
one or more RNA sequences that fold into the given target structure.
CP performs a complete exploration of the search space, and, thus can
also prove that no sequence folds into the target structure exists.
Since computation time may be exhorbitant, the latter is only feasible
for sufficiently small structures.
RNA-LNSdesign in now embedded as an option into RNA-CPdesign. Given a target structure and optional sequence constraints, LNSdesign
uses Large Neighborhood Search (LNS) to determine
one or more RNA sequences that fold into the target structure.
LNS is a heuristic, that calls CP as a subroutine, which is
suitable for larger structures. Since LNSsearch is a heuristic algorithm,
it cannot prove the nonexistence of a solution to an inverse folding problem.
Here is an example of RNA Inverse Folding software using a known RNA:
This is the minimum free energy secondary structure of an Oryza Sativa RNA, proposed as inverse folding problem "Oryza Sativa 4" on the EteRNA web site.
Target sequence must have as most 40 GC pairs and at least 10 GU pairs.
((((((((.(((((.((((((((((((((((((((((((.(((((((((((((.((((( (((((................))))))))))..(((((((((((((......))))))) ))))))))))))))))))).))))))))))))))))))))))))))))).))))))))
Using RNA-CPdesign, within 330 ms, the following sequence was found to fold into the given target structure:
AUUAAUAAAGUUGAUGGUGAGAGGAUAGUUAGAUAGGGGAGGGGGGGGGGGGGAGGGGG GGGGCAAAAAAAAAAAAAAAAGCCCCCCCCCAAGGGGGGGCGGGGCAAAAAAGCCCCGC CCCCCCCCCCCCCCCCCCCACCCCUAUUUAAUUAUUUUUUUAUUUUAAUAUUAUUAAU
Graphical representation of the minimum free energy (MFE) secondary structure using VARNA.
Both servers include a co-folding option: given a hybridization structure, represented in dot-bracket notation with an ampersand sign '&' between the first and second hybridized portions, the servers can determine two RNA sequences, separated by an ampersand sign '&', whose MFE hybridization is the input structure. Here the MFE hybridization is computed using RNAcofold from the Vienna RNA Package. For instance, the MFE hybridization of the sequences 5'-GGGGGAACCCCGGGGGGGGG-3' and 5'-CCCCCCCCCC-3', represented by concatenated sequences with separating ampersand, 'GGGGGAACCCCGGGGGGGGG&CCCCCCCCCC', is
corresponding to the hybridization
5'-GGGGGAACCCCGGGGGGGGG-3' br> (((....))).||||||||| 3'-CCCCCCCCCC-5'
which includes intra-molecular structure of the first sequence, along with hybridization between first and second sequences. Using CPdesign, with input
we obtain the following sequences, separated by ampersand, whose MFE hybridization is the target structure:
Modena tarball (modena.tgz)
RF00001.121 RF00002.2 RF00003.94 RF00004.126 RF00005.1 RF00006.1
RF00007.20 RF00008.11 RF00009.115 RF00010.253 RF00011.18 RF00012.15
RF00013.139 RF00014.2 RF00015.101 RF00016.15 RF00017.90 RF00018.2
RF00019.115 RF00020.107 RF00021.10 RF00022.1 RF00024.16 RF00025.12
RF00026.1 RF00027.7 RF00028.1 RF00029.107 RF00030.30
RNA-SSD 1 tarball (rna-ssd1.tgz)
Z83250 L11935 LIU92530 U84629 AF107506 AF106618
AJ011149 S70838 U63350 AF141485 U81771 AJ130779
AF096836 X61771 AJ236455 AJ132572 AB015827 D38777
AF029195 X81949 AJ133622 AF056938 X99676 L77117
RNA-SSD Test Set C
RNA-SSD 2 tarball (rna-ssd2.tgz)
Minimal catalytic domains of the hairpin ribozyme satellite RNA of the tobacco ringspot virus (Figure 1a) (Fedor, 2000)
U3 snoRNA 5'-domain from Chlamydomonas reinhardtii, in vivo probing (Figure 6B) (Antal et al., 2000)
H. marismortui 5S rRNA (Figure 2) (Szymanski et al., 2002)
VS Ribozyme from Neurospora mitochondria (Figure 1A) (Lafontaine et al., 2001)
R180 ribozyme (Figure 2B) (Sun et al., 2002)
XS1 ribozyme, Bacillus subtilis P RNA-based ribozyme (Figure 2A) (Mobley and Pan, 1999)
Homo Sapiens RNase P RNA (Figure 4) (Pitulle et al., 1998)
S20 mRNA from E.coli (Figure 2) (Mackie, 1992)
Halobacterium cutirubrum RNAse P RNA (Figure 2) (Haas et al., 1990)
Group II intron ribozyme D135 from ai5g (Figure 5) (Swisher et al., 2001)
EteRNA tarball (EteRNA.tgz)
Prion Pseudoknot Human astrovirus Homo Sapiens 1 Series
HIV Primer Binding Site Homo Sapiens 3 Other Ribosomal RNA Bacilus Subtilis sRNA
5s Ribosomal RNA Tribolium Castaneum Oryza sativa 4 Symbiotic plasmid Telomerase RNA
If you use RNAiFold for your research, would you please cite the following:
Juan Antonio Garcia-Martin, Peter Clote, Ivan Dotu.
RNAiFold: A constraint programming algorithm for RNA inverse folding and molecular design.
J Bioinform Comput Biol 11(2): 1350001, 2013.
You can read an example of RNAiFold applied to synthetic design at:
Complete RNA inverse folding: computational design of functional hammerhead ribozymes.
Dotu I, Garcia-Martin JA, Slinger BL, Mechery V, Meyer MM, Clote P.
Nucleic Acids Res. 2015;42(18):11752-62.
For comparison, there are related publications by other groups:
- RNAinverse software: I. Hofacker, W. Fontana, P. Stadler, L. Bonhoeffer, M. Tacker and P. Schuster. Fast folding and comparison of RNA Secondary structures. Monatsh Chem. 125:167-188 (1994).
- RNA-SSD software: M. Andronescu, AP. Fejes, F. Hutter, HH. Hoos and A. Condon. A new algorithm for RNA secondary structure design. J Mol Biol. 336: 607-624 (2004).
- InfoRNA software: A. Busch and R. Backofen. INFO-RNA - fast approach to inverse RNA folding. Bioinformatics 22 15:1823-1831 (2006).
- Modena software: A. Taneda. MODENA: a multi-objective RNA inverse folding. Advances and Applications in Bioinformatics and Chemistry 4:1-12 (2011).
- NUPACK software: J.N. Zadeh, B.R. Wolfe, N.A. Pierce. Nucleic acid sequence design via efficient ensemble defect optimization. J Comput Chem, 32, 439–452, (2011)