Web Supplement
This web supplement includes additional material for the paper:
Complete RNA inverse folding: computational design of functional
hammerhead ribozymes
I. Dotu*,
J.A. Garcia Martin*,
Betty L. Slinger*,
V. Mechery,
M.M. Meyer,
P. Clote.
Structural Diversity Measures
Below, we present figures with superimposed graphs
for the following, mostly structural measures, which are defined in
Supplementary Information: EBPDDistActive, EBPDDistAll, Energy,
EnsDef, EnsDefectActive, EnsDefectAll, Entropy2DistActive, Entropy2DistAll,
EntropyDistActive, EntropyDistAll, ExpBPDist, FullEntropy, GCcont,
ProbOfStr, Struct.Div.MH., Struct.Div, StructDivActive, StructDivAll,
StructDivMHActive, StructDivMHAll.
Design algorithms RNAiFold [1,2], RNAdesign [3], and IncaRNAtion [4] were
each run, given the Rfam consensus structure of PLMVd hammerhead as
target structure. Since
PLMVd hammerhead folds into the Rfam consensus structure, only when
using the Turner 99 model energy model (not the Turner 2004 energy model),
all benchmarking with RNAiFold used the Turner 99 model -- note, however,
that RNAiFold allows the user to select either Turner 99 or Turner 2004
energy models. In contrast, RNAdesign appears to
only allow the default Turner 2004 energy model, while the energy model
for IncaRNAtion only involves base stacking free energies (no entropic
contributions from loops).
In computational experiments reported below,
it was necessary to postprocess all sequences returned by both
RNAdesign and IncaRNAtion, in order to
retain only those sequences whose MFE structure is
the target secondary structure.
This postprocessing step added substantial time: 4.32 hours for RNAdesign and
13.99 hours for IncaRNAtion.
1. Comparison of methods when run for same time
Each of RNAiFold, RNAdesign and IncaRNAtion was run for the
same fixed time of 4.4 hours (16477 seconds). Although RNAiFold is
guaranteed to return sequences that fold into the target PLMVd
structure, an additional postprocessing step was required for
RNAdesign and IncaRNAtion, requiring additional
time to run RNAfold from Vienna RNA Package on
over 3.3 million [resp. 12.3 million] sequences returned by
RNAdesign [resp. IncaRNAtion], in order to select those whose
MFE structure agrees with that of wild type PLMVd.
The table below shows the total number of sequences returned
by each program within the allotted time, the percentage of sequences,
whose MFE structure is the target PLMVd structure, and the final
number of sequences that fold into the target structure.
|
RNAiFold |
RNAdesign |
IncaRNAtion |
Tot seq |
180,243 |
3,382,729 |
12,332,554 |
% fold to target |
100.00% |
5.84% |
2.57% |
Final num seq |
180243 |
197414 |
317218 |
The following link provides access to comparative figures and data
for the output from RNAiFold and the filtered output from RNAdesign
and IncaRNAtion, given the constraints from the paper; i.e. 15 nucleotides,
having 96% sequence identity in the Rfam seed alignment with PLMVd, together
with H8, and all remaining positions constrained to contain nucleotides
different than that of wild type PLMVd hammerhead.
2. Comparison of RNAiFold, IncaRNAtion, and RNAdesign
The following links present a comparison of
structural diversity measures for sequences returned by
RNAiFold, IncaRNAtion and RNAdesign for
for three different computational experiments, where
GC-content was required to range from 35-55%, and
sequence constraints were given for the 19 [resp. 15 resp. 11] positions having
90% [resp. 96% resp. 98%] sequence identity in the Rfam seed alignment with
PLMVd (as in the paper, for threshold 96%, the H8 constraint was included).
[Note that RNAdesign does not allow one to constrain GC-content.]
RNAiFold was run until memory exhaustion; RNAdesign and IncaRNAtion were each
run with an option selected to return 1 million sequences. For reasons
internal to RNAdesign and IncaRNAtion, neither returned exactly 1 million
sequences. Percentages of returned structures for RNAdesign and IncaRNAtion,
whose MFE structure matched the input target structure, were comparable as
those given in the previous table.
-
Compare-Methods-90-50 :
Comparison of for sequences constrained to be identical with 19 positions of
wild type PLMVd, having sequence identity in the Rfam seed alignment exceeding
90%.
-
Compare-Methods-96-50
Comparison of for sequences constrained to be identical with 15 positions of
wild type PLMVd, having sequence identity in the Rfam seed alignment exceeding
96%.
-
Compare-Methods-98-50
Comparison of for sequences constrained to be identical with 11 positions of
wild type PLMVd, having sequence identity in the Rfam seed alignment exceeding
98%.
3. Comparison of same method for thresholds 90%, 96%, 98%
The following links present a comparison of
structural diversity measures for sequences that agree with wild type
PLMVd in 19 [resp. 15 resp. 11] positions, corresponding to those positions
in the Rfam seed alignment having
at least 90% [resp. 96% resp. 98%] sequence identity. The data is
identical to that in Section 2, with different groupings.
-
RNAiFold-Conservation :
Sequences generated by RNAiFold.
-
Inc-Conservation:
Sequences generated by IncaRNAtion.
-
RNAdesign-Conservation:
Sequences generated by RNAdesign.
Additional notes
README.txt
*: Indication of joint first authors.
References
[1]
J.A. Garcia Martin, P. Clote, I. Dotu.
RNAiFold: a web server for RNA inverse folding and molecular design.
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W465-70.
[2] J.A. Garcia Martin, P. Clote, I. Dotu.
RNAiFold: A constraint programming algorithm for RNA inverse
folding and molecular design.
J Bioinform Comput Biol. 2013 Apr;11(2):1350001.
[3] J.C. Höner zu Siederdissen, S. Hammer, I.Abfalter, I.L. Hofacker,
C. Flamm, P.F. Stadler.
Computational design of RNAs with complex energy landscapes.
Biopolymers 99(12):1124--1136 (2013).
[4] V. Reinharz, Y. Ponty, J. Waldispuhl.
A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution.
Bioinformatics 29(13):i308-i315 (2013).