Modeling RNA Folding Landscapes with Probabilistic Roadmap Methods
supported by NSF
Shawna Thomas, Nancy Amato
Project Collaborators: David P. Giedroc
Project Alumni: Bonnie Kirkpatrick, Guang Song, Xinyu Tang, Lydia Tapia
It has recently been found that some RNA functions are determined by the actual folding process itself and not just the RNA's nucleotide sequence or its native structure. In this work, we present new computational tools that can be used to study kinetics-based functions for RNA such as population kinetics, folding rates, and the folding of particular subsequences. Previously, these properties could only be studied for small RNA whose conformation space could be enumerated (e.g., RNA with 30-40 nucleotides) or for RNA whose kinetics were restricted in some way. In this work, we provide computational tools to approximate the folding energy landscape and extract both global properties and detailed features of the folding process. The key advantage of our approach over other computational techniques is that it is fast and efficient while bridging the gap between high-level folding events and low-level folding details. Previously, we successfully applied this strategy to study protein folding landscapes.
Our method first builds an approximate map (or model) of the RNA's folding energy landscape. Then, to study the global folding kinetics, we use the master equation, a matrix of differential equations to describe the population kinetics of the RNA. It allows us to discover features such as folding rates, transition states, and folding pathways. Next, to study some low-level folding details, we use our new analysis technique, called Map-based Monte Carlo (MMC) simulation, to stochastically extract folding pathways from the map. MMC, in conjunction with a novel sampling strategy (Boltzmann Statistical Sampling - BSS) for building the map, enables us to study kinetics-based functions for large RNA, e.g., RNA with 200+ nucleotides.
We provide several different simulation results to validate our method against known experimental data. We also show how our method can study kinetics-based functions for two different case studies. First, we compare simulated folding rates for ColE1 RNAII (200 nucleotides) and its mutants against experimental rates and show that our method identifies the same relative folding order as seen in experiment. Second, we predict the gene expression rates of wild-type MS2 phage RNA (135 nucleotides) and three of its mutants and and show that our approach computes the same relative functional rates as seen in experiment.
Exploration of RNA Folding Pathways: Here we demonstrate one of the folding pathways we extracted from our roadmap. The left figure displays the energy profile on this folding pathway. The right animation (generated by rnamovies) shows the folding process from a misfolded conformation into the native conformation. In order to folding to the native conformation from the misfolded state, the molecule must pass through an energy barrier.
|Energy profile||Folding pathway|
Exploration of Population Kinetics: To validate our description of the energy landscape using roadmaps, we calculate the population kinetics from our roadmaps and compare them with the results calculated by standard Monte Carlo simulation. The figure below compares the population kinetics of the native state of 1k2g (CAGACUUCGGUCGCAGAGAUGG) using (a) standard Monte Carlo simulation (implemented by Kinfold ), (b) Map-based Monte Carlo (MMC) simulation on a fully enumerated Base-pair (BP) roadmap (12,137 conformations), (c) Map-based Monte Carlo (MMC) simulation on a roadmap with our new Boltzmann statistical sampling (BSS) method (42 conformations), and (d) the master equation (ME) on a BSS roadmap (42 conformations). The BP roadmap is the most accurate representation while the BSS roadmaps yields much smaller subsets of the entire conformation space that effectively approximate the energy landscape.
All population kinetics curves have similar features. Also note that the equilibrium (final) distributions are all the same, even though the Boltzmann statistical-sampling roadmap (c) and (d) contains less than 0.4% of all possible conformations. Thus, these roadmaps capture the main features of the energy landscape. This data validates that these analysis methods are interchangeable. We also did some quantitative comparisons on the master equation results, see our papers for more details.
|Monte Carlo (by Kinfold)||MMC on Base-pair map||MMC on BSS map||ME on BSS map|
|Comparison of the Population Kinetics calculated using different methods.|
Prediction of the Plasmid Replication Rate of ColE1 RNAII: ColE1 RNAII (200 nucleotides) regulates the replication of E. coli ColE1 plasmids through its folding kinetics. The slower it folds, the higher the plasmid replication rate. A specific mutant, MM7, differs from the wild-type (WT) by a single nucleotide out of the 200 nucleotide sequence. This mutation causes it to fold slower while maintaining the same thermodynamics of the native state. Thus, the overall plasmid replication rate increases in the presence of MM7 over the WT. In the figure below, we compare simulated folding rates for ColE1 RNAII and its mutants against experimental rates. It shows the eigenvalues calculated using the master equation. Note that the smallest non-zero eigenvalues correspond to the folding rate. All eigenvalues of WT are larger than MM7 indicating that WT folds faster than MM7. It shows that our method identifies the same relative folding order as seen in experiment.
Prediction of the Gene Expression Rates of MS2 phage RNA: MS2 phage RNA (135 nucleotides) regulates the expression rate of phage MS2 maturation protein at the translational level. It works as a regulator only when a specific subsequence (the SD sequence) is open (i.e., does not form base-pair contacts). Three mutants have been studied that have similar thermodynamic properties with the wild-type (WT) but different kinetics and therefore different gene expression rates. Experimental results indicate that mutant CC3435AA has the highest gene expression rate, WT and mutant U32C are similar, and mutant SA has the lowest rate. Intuitively, the functional rate (e.g., gene expression rate in this case) is correlated with the opening of the SD sequence. If the SD sequence is opened longer, or has higher opening probability (i.e., having more nucleotides on the SD sequence open), then the mutant should have higher functional rate. We use our simulation method to study this opening probability during the folding process.Note that mutant CC3435AA has the longest duration at a relatively high level of opening probability while mutant SA has the shortest duration. This correlates with experimental data. The opening probability of U32C decreases earlier but finishes later than WT, so it is not clear which one has a larger total opening probability during folding, again matching experimental findings.
|Comparison of the SD opening probability during the folding process.|
(with regard to WT)
Comparison of integration of SD opening probability between WT and three mutants of MS2.
Applications of Motion Planning to Computational Biology
A Motion Planning Approach to Studying Molecular Motions, Lydia Tapia, Shawna Thomas, Nancy M. Amato, Communications in Information and Systems, 10(1):53-68, 2010. Also, Technical Report, TR08-006, Parasol Laboratory, Department of Computer Science, Texas A&M University, Nov 2008.
Journal(pdf, abstract) Technical Report(abstract)
Intelligent Motion Planning and Analysis with Probabilistic Roadmap Methods for the Study of Complex and High-Dimensional Motions, Lydia Tapia, Ph.D. Thesis, Parasol Laboratory, Department of Computer Science, Texas A&M University, College Station, Texas, Dec 2009.
Ph.D. Thesis(pdf, abstract)
Simulating RNA Folding Kinetics on Approximated Energy Landscapes, Xinyu Tang, Shawna Thomas, Lydia Tapia, David P. Giedroc, Nancy M. Amato, Journal of Molecular Biology, 3811(4):1055-1067, Sep 2008. Also, Technical Report, TR07-008, Parasol Laboratory, Department of Computer Science, Texas A&M University, Oct 2007.
Journal(pdf, abstract) Technical Report(ps, pdf, abstract)
Techniques for Modeling and Analyzing RNA and Protein Folding Energy Landscapes, Xinyu Tang, Ph.D. Thesis, Department of Computer Science and Engineering, Texas A&M University, Dec 2007.
Ph.D. Thesis(ps, pdf, abstract)
Tools for Simulating and Analyzing RNA Folding Kinetics, Xinyu Tang, Shawna Thomas, Lydia Tapia, Nancy M. Amato, Technical Report, TR07-007, Parasol Laboratory, Department of Computer Science, Texas A&M University, Oct 2007. Also, In Proc. Int. Conf. Comput. Molecular Biology (RECOMB), pp. 268-282, San Francisco, CA, Apr 2007. Also, Technical Report, TR06-012, Parasol Laboratory, Department of Computer Science, Texas A&M University, Oct 2006.
Technical Report(ps, pdf, abstract) Proceedings(ps, pdf, abstract) Technical Report(ps, pdf, abstract)
Using Motion Planning to Study RNA Folding Kinetics, Xinyu Tang, Bonnie Kirkpatrick, Shawna Thomas, Guang Song, Nancy M. Amato, Journal of Computational Biology, 12(6):862-881, Jul 2005. Also, In Proc. Int. Conf. Comput. Molecular Biology (RECOMB), pp. 252-261, San Diego, CA, Mar 2004. Also, Technical Report, TR03-005, Parasol Laboratory, Department of Computer Science, Texas A&M University, Sep 2003.
Journal(ps, pdf, abstract) Proceedings(ps, pdf, abstract) Technical Report(ps, pdf, abstract)
Parasol Home | Research | People | General info | Seminars | Resources
Parasol Lab, 301 Harvey R. Bright Bldg, 3112 TAMU, College Station, TX 77843-3112
Contact Webmaster Phone 979.458.0722 Fax 979.458.0718
Department of Computer Science and Engineering | Dwight Look College of Engineering | Texas A&M University
Privacy statement: Computer Science and Engineering Engineering TAMU
Web Accessibility Policy and Law - Web Accessibility and Usability Standards