Web Supplement

    This web supplement includes additional material for the paper:
    Complete RNA inverse folding: computational design of functional hammerhead ribozymes
    I. Dotu*, J.A. Garcia Martin*, Betty L. Slinger*, V. Mechery, M.M. Meyer, P. Clote.


    Structural Diversity Measures

    Below, we present figures with superimposed graphs for the following, mostly structural measures, which are defined in Supplementary Information: EBPDDistActive, EBPDDistAll, Energy, EnsDef, EnsDefectActive, EnsDefectAll, Entropy2DistActive, Entropy2DistAll, EntropyDistActive, EntropyDistAll, ExpBPDist, FullEntropy, GCcont, ProbOfStr, Struct.Div.MH., Struct.Div, StructDivActive, StructDivAll, StructDivMHActive, StructDivMHAll.

    Design algorithms RNAiFold [1,2], RNAdesign [3], and IncaRNAtion [4] were each run, given the Rfam consensus structure of PLMVd hammerhead as target structure. Since PLMVd hammerhead folds into the Rfam consensus structure, only when using the Turner 99 model energy model (not the Turner 2004 energy model), all benchmarking with RNAiFold used the Turner 99 model -- note, however, that RNAiFold allows the user to select either Turner 99 or Turner 2004 energy models. In contrast, RNAdesign appears to only allow the default Turner 2004 energy model, while the energy model for IncaRNAtion only involves base stacking free energies (no entropic contributions from loops).

    In computational experiments reported below, it was necessary to postprocess all sequences returned by both RNAdesign and IncaRNAtion, in order to retain only those sequences whose MFE structure is the target secondary structure. This postprocessing step added substantial time: 4.32 hours for RNAdesign and 13.99 hours for IncaRNAtion.

    1. Comparison of methods when run for same time

    Each of RNAiFold, RNAdesign and IncaRNAtion was run for the same fixed time of 4.4 hours (16477 seconds). Although RNAiFold is guaranteed to return sequences that fold into the target PLMVd structure, an additional postprocessing step was required for RNAdesign and IncaRNAtion, requiring additional time to run RNAfold from Vienna RNA Package on over 3.3 million [resp. 12.3 million] sequences returned by RNAdesign [resp. IncaRNAtion], in order to select those whose MFE structure agrees with that of wild type PLMVd. The table below shows the total number of sequences returned by each program within the allotted time, the percentage of sequences, whose MFE structure is the target PLMVd structure, and the final number of sequences that fold into the target structure.
    RNAiFold RNAdesign IncaRNAtion
    Tot seq 180,243 3,382,729 12,332,554
    % fold to target 100.00% 5.84% 2.57%
    Final num seq 180243 197414 317218

    The following link provides access to comparative figures and data for the output from RNAiFold and the filtered output from RNAdesign and IncaRNAtion, given the constraints from the paper; i.e. 15 nucleotides, having 96% sequence identity in the Rfam seed alignment with PLMVd, together with H8, and all remaining positions constrained to contain nucleotides different than that of wild type PLMVd hammerhead.

    • SameTimeMethods-50


    2. Comparison of RNAiFold, IncaRNAtion, and RNAdesign

    The following links present a comparison of structural diversity measures for sequences returned by RNAiFold, IncaRNAtion and RNAdesign for for three different computational experiments, where GC-content was required to range from 35-55%, and sequence constraints were given for the 19 [resp. 15 resp. 11] positions having 90% [resp. 96% resp. 98%] sequence identity in the Rfam seed alignment with PLMVd (as in the paper, for threshold 96%, the H8 constraint was included). [Note that RNAdesign does not allow one to constrain GC-content.] RNAiFold was run until memory exhaustion; RNAdesign and IncaRNAtion were each run with an option selected to return 1 million sequences. For reasons internal to RNAdesign and IncaRNAtion, neither returned exactly 1 million sequences. Percentages of returned structures for RNAdesign and IncaRNAtion, whose MFE structure matched the input target structure, were comparable as those given in the previous table.
    1. Compare-Methods-90-50 : Comparison of for sequences constrained to be identical with 19 positions of wild type PLMVd, having sequence identity in the Rfam seed alignment exceeding 90%.
    2. Compare-Methods-96-50 Comparison of for sequences constrained to be identical with 15 positions of wild type PLMVd, having sequence identity in the Rfam seed alignment exceeding 96%.
    3. Compare-Methods-98-50 Comparison of for sequences constrained to be identical with 11 positions of wild type PLMVd, having sequence identity in the Rfam seed alignment exceeding 98%.


    3. Comparison of same method for thresholds 90%, 96%, 98%

    The following links present a comparison of structural diversity measures for sequences that agree with wild type PLMVd in 19 [resp. 15 resp. 11] positions, corresponding to those positions in the Rfam seed alignment having at least 90% [resp. 96% resp. 98%] sequence identity. The data is identical to that in Section 2, with different groupings.
    1. RNAiFold-Conservation : Sequences generated by RNAiFold.
    2. Inc-Conservation: Sequences generated by IncaRNAtion.
    3. RNAdesign-Conservation: Sequences generated by RNAdesign.

    Additional notes

    README.txt


    *: Indication of joint first authors.


    References

    [1] J.A. Garcia Martin, P. Clote, I. Dotu.
    RNAiFold: a web server for RNA inverse folding and molecular design.
    Nucleic Acids Res. 2013 Jul;41(Web Server issue):W465-70.

    [2] J.A. Garcia Martin, P. Clote, I. Dotu.
    RNAiFold: A constraint programming algorithm for RNA inverse folding and molecular design.
    J Bioinform Comput Biol. 2013 Apr;11(2):1350001.

    [3] J.C. Höner zu Siederdissen, S. Hammer, I.Abfalter, I.L. Hofacker, C. Flamm, P.F. Stadler.
    Computational design of RNAs with complex energy landscapes.
    Biopolymers 99(12):1124--1136 (2013).

    [4] V. Reinharz, Y. Ponty, J. Waldispuhl.
    A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution.
    Bioinformatics 29(13):i308-i315 (2013).