Welcome to the Clote Lab Inverse Folding website !
The RNA inverse folding problem is the problem, given a target secondary structure in dot bracket notation, of determining one or more RNA sequences, whose minimum free energy (MFE) structure is the target structure. Here, the MFE structure is computed using RNAfold from the Vienna RNA Package. In addition, the user may provide sequence constraints, stipulating that certain positions be occupied by specific nucleotides, or that (for instance) the solution sequence has a GC-content within a certain user-specified range. This website provides access to two algorithms for the inverse folding problem:
-
RNA Synthetic design. Simple 3-step pipeline to design synthetic RNAs that fold into the consensus structure of a user-selected Rfam family. Sequence constaints are automatically generated for those positions, whose sequence conservation in the Rfam seed alignment exceeds a user-specified threshold -- since functionally important nucleotides (e.g. the active site) are known to be conserved, this step improves likelihood of designing a functionally active synthetic RNA. Additional constraints and structural compatibility/incompatibility requirements can be enforced.
-
RNA-CPdesign. Given a target structure and optional sequence constraints,
CPdesign uses Constraint Programming (CP) to determine
one or more RNA sequences that fold into the given target structure.
CP performs a complete exploration of the search space, and, thus can
also prove that no sequence folds into the target structure exists.
Since computation time may be exhorbitant, the latter is only feasible
for sufficiently small structures.
-
RNA-LNSdesign.
RNA-LNSdesign in now embedded as an option into RNA-CPdesign. Given a target structure and optional sequence constraints, LNSdesign
uses Large Neighborhood Search (LNS) to determine
one or more RNA sequences that fold into the target structure.
LNS is a heuristic, that calls CP as a subroutine, which is
suitable for larger structures. Since LNSsearch is a heuristic algorithm,
it cannot prove the nonexistence of a solution to an inverse folding problem.
Here is an example of RNA Inverse Folding software using a known RNA:
This is the minimum free energy secondary structure of an Oryza Sativa RNA, proposed as inverse folding problem "Oryza Sativa 4" on the EteRNA web site.
Target sequence must have as most 40 GC pairs and at least 10 GU pairs.
((((((((.(((((.((((((((((((((((((((((((.(((((((((((((.((((( (((((................))))))))))..(((((((((((((......))))))) ))))))))))))))))))).))))))))))))))))))))))))))))).))))))))
Using RNA-CPdesign, within 330 ms, the following sequence was found to fold into the given target structure:
AUUAAUAAAGUUGAUGGUGAGAGGAUAGUUAGAUAGGGGAGGGGGGGGGGGGGAGGGGG GGGGCAAAAAAAAAAAAAAAAGCCCCCCCCCAAGGGGGGGCGGGGCAAAAAAGCCCCGC CCCCCCCCCCCCCCCCCCCACCCCUAUUUAAUUAUUUUUUUAUUUUAAUAUUAUUAAU
Graphical representation of the minimum free energy (MFE) secondary structure using VARNA.
Both servers include a co-folding option: given a hybridization structure, represented in dot-bracket notation with an ampersand sign '&' between the first and second hybridized portions, the servers can determine two RNA sequences, separated by an ampersand sign '&', whose MFE hybridization is the input structure. Here the MFE hybridization is computed using RNAcofold from the Vienna RNA Package. For instance, the MFE hybridization of the sequences 5'-GGGGGAACCCCGGGGGGGGG-3' and 5'-CCCCCCCCCC-3', represented by concatenated sequences with separating ampersand, 'GGGGGAACCCCGGGGGGGGG&CCCCCCCCCC', is
(((....))).(((((((((&))))))))).
corresponding to the hybridization
5'-GGGGGAACCCCGGGGGGGGG-3' br> (((....))).||||||||| 3'-CCCCCCCCCC-5'
which includes intra-molecular structure of the first sequence, along with hybridization between first and second sequences. Using CPdesign, with input
(((....))).(((((((((&))))))))).
we obtain the following sequences, separated by ampersand, whose MFE hybridization is the target structure:
GGCAAAAGCCAGGCGCGGGC&GCCCGCGCCA
Modena structures
Modena tarball (modena.tgz)
RF00001.121
RF00002.2
RF00003.94
RF00004.126
RF00005.1
RF00006.1
RF00007.20
RF00008.11
RF00009.115
RF00010.253
RF00011.18
RF00012.15
RF00013.139
RF00014.2
RF00015.101
RF00016.15
RF00017.90
RF00018.2
RF00019.115
RF00020.107
RF00021.10
RF00022.1
RF00024.16
RF00025.12
RF00026.1
RF00027.7
RF00028.1
RF00029.107
RF00030.30
RNA-SSD structures
RNA-SSD 1 tarball (rna-ssd1.tgz)
Z83250
L11935
LIU92530
U84629
AF107506
AF106618
AJ011149
S70838
U63350
AF141485
U81771
AJ130779
AF096836
X61771
AJ236455
AJ132572
AB015827
D38777
AF029195
X81949
AJ133622
AF056938
X99676
L77117
RNA-SSD Test Set C
RNA-SSD 2 tarball (rna-ssd2.tgz)
Minimal catalytic domains of the hairpin ribozyme satellite RNA of the tobacco ringspot virus (Figure 1a) (Fedor, 2000)
U3 snoRNA 5'-domain from Chlamydomonas reinhardtii, in vivo probing (Figure 6B) (Antal et al., 2000)
H. marismortui 5S rRNA (Figure 2) (Szymanski et al., 2002)
VS Ribozyme from Neurospora mitochondria (Figure 1A) (Lafontaine et al., 2001)
R180 ribozyme (Figure 2B) (Sun et al., 2002)
XS1 ribozyme, Bacillus subtilis P RNA-based ribozyme (Figure 2A) (Mobley and Pan, 1999)
Homo Sapiens RNase P RNA (Figure 4) (Pitulle et al., 1998)
S20 mRNA from E.coli (Figure 2) (Mackie, 1992)
Halobacterium cutirubrum RNAse P RNA (Figure 2) (Haas et al., 1990)
Group II intron ribozyme D135 from ai5g (Figure 5) (Swisher et al., 2001)
EteRNA sequences
EteRNA tarball (EteRNA.tgz)
Prion Pseudoknot
Human astrovirus
Homo Sapiens 1 Series
HIV Primer Binding Site
Homo Sapiens 3
Other Ribosomal RNA
Bacilus Subtilis sRNA
5s Ribosomal RNA
Tribolium Castaneum
Oryza sativa 4
Symbiotic plasmid
Telomerase RNA
If you use RNAiFold for your research, would you please cite the following:
-
Juan Antonio Garcia-Martin, Peter Clote, Ivan Dotu.
RNAiFold: A constraint programming algorithm for RNA inverse folding and molecular design.
J Bioinform Comput Biol 11(2): 1350001, 2013.
You can read an example of RNAiFold applied to synthetic design at:
Complete RNA inverse folding: computational design of functional hammerhead ribozymes.
Dotu I, Garcia-Martin JA, Slinger BL, Mechery V, Meyer MM, Clote P.
Nucleic Acids Res. 2015;42(18):11752-62.
For comparison, there are related publications by other groups:
- RNAinverse software: I. Hofacker, W. Fontana, P. Stadler, L. Bonhoeffer, M. Tacker and P. Schuster. Fast folding and comparison of RNA Secondary structures. Monatsh Chem. 125:167-188 (1994).
- RNA-SSD software: M. Andronescu, AP. Fejes, F. Hutter, HH. Hoos and A. Condon. A new algorithm for RNA secondary structure design. J Mol Biol. 336: 607-624 (2004).
- InfoRNA software: A. Busch and R. Backofen. INFO-RNA - fast approach to inverse RNA folding. Bioinformatics 22 15:1823-1831 (2006).
- Modena software: A. Taneda. MODENA: a multi-objective RNA inverse folding. Advances and Applications in Bioinformatics and Chemistry 4:1-12 (2011).
- NUPACK software: J.N. Zadeh, B.R. Wolfe, N.A. Pierce. Nucleic acid sequence design via efficient ensemble defect optimization. J Comput Chem, 32, 439–452, (2011)