Project Collaborators:David P. Giedroc (Indiana University Bloomington)
It has recently been found that some RNA functions are determined by the actual folding process itself and not just the RNA's nucleotide sequence or its native structure. In this work, we present new computational tools that can be used to study kinetics-based functions for RNA such as population kinetics, folding rates, and the folding of particular subsequences. Previously, these properties could only be studied for small RNA whose conformation space could be enumerated (e.g., RNA with 30-40 nucleotides) or for RNA whose kinetics were restricted in some way. In this work, we provide computational tools to approximate the folding energy landscape and extract both global properties and detailed features of the folding process. The key advantage of our approach over other computational techniques is that it is fast and effiecient while bridging the gap between high-level folding events and low level folding details. Previously, we successfully applied this strategy to study protein folding landscapes.
Our method first builds an approximate map (or model) of the RNA's folding energy landscape. Then, to study the global folding kinetics, we use the master equation, a matrix of differential equations to describe the population kinetics of teh RNA. It allows us to discover features such as folding rates, transitions rates, and folding pathways. Next, to study some low-level folding details, we use our new analysis technique, called Map-based Monte Carlo (MMC) simulation, to stochastically extract folding pathways from the map. MMC, in conjunction with a novel sampling strategy (Boltzmann Statistical Sampling -BSS) for building the map, enables us to study kinetics-based functions for large RNA, e.g., RNA with 200+ nucleotides.
We provide several different simulation results to validate our method against known experimental data. We also show how our method can study kinetics-based functions for two different case studies. First, we compare simulated folding rates for ColE1 RNAII (200 nucleotides) and its mutants against experimental rates and show that our method identifies the same relative folding order as seen experiment. Second, we predict the gene expression rates fo wild-type MS2 phage RNA (135 nucleotides) and three of its mutants and show that our approach computes the same relative functionl rates as seen in experiment.
Here we demonstrate one of the folding pathways we extracted from our roadmap. The left figure displays the enerygy profile on this folding pathway. The right animation (generated by rnamovies) shows the folding process from a misfolded conformation into the native conformation. In order to folding to the native conformation from the misfolded state, the molecule must pass through an energy barrier.
|Energy profile||Folding pathway|
To validate our description of the enerfy landscape using roadmaps, we calculate the population kinetics from our roadmaps and compare them with the results calculated by standard Monte Carlo simulation. The figure below compares the population kinetics of the native state of 1k2g (CAGACUUCGGUCGCAGAGAUAGG) using (a) standard Monte Carlo simulation (implented by Kinfold), (b) Map-based Monte Carlo (MMC) simulation on a fully enumerated Base-pair (BP) roadmap (12.137 conformations), (c) Map-based Monte Carlo (MMC) simulation on a roadmap with our new Boltsmann statistical sampling (BSS) method (42 conformations), and (d) the master equation (ME) on a BSS roadmap (42 conformations). The BP roadmaps is the most accurate representation while the BSS roadmaps yields much smaller subsets of the entire conformation space that effectively approximate the energy landscape.
All population kinetics curves have similar features. Also note that the equilibium (final) distributions are all the same, even though the Boltzmann statistical-sampling roadmap (c) and (d) contains less than 0.4% of all possible conformations. Thus, these roadmaps capture the main features of the energy landscape. This data validates that these analysis methods are interchangeable. We also did some quantitative comparisons on the master equation results, see our papers for more details.
|Monte Carlo (by Kinfold)||MMC on Base-pair map||MMC on BSS map||ME on BSS map|
|Comparison of the Population Kinetics calculated using different methods.|
ColE1 RNAII (200 nucleotides) regulates the replication of E. coli ColE1 plasmids through this folding kinetics. The slower it folds, the higher the plasmid replication rate. A specific mutant, MM7, differts from the wild-type (WT) by a single nucleotide out of the 200 nucleotide sequence. This mutation causes it to fold slower while maintaining the same thermodynamics of the native state. Thus, the overall plasmid replication rate increases in the presence of MM7 over the WT. In the figure below, we compare simulated folding rates for ColE1 RNAII and its mutants against experimental rates. It shows the eigenvalues calculated using the master equation. Note that the smallest non-zero eigenvalues correspond to the folding rate. All eigenvalues of WT are larger than MM7 indicating that WT folds faster than MM7. It shows that our method identifies the same relative folding order as seen in experiment.
Prediction fo the Gene Expression Rates of MS2 pahge RNA: MS2 phage RNA (135 nucleotides) regulates the expression rate of phage MS2 maturation protein at the translational level. It works as a regulator only when a specific subsequence (the SD sequence) is open (i.e., does not form base-pair contacts). Three mutants have been studied that have similar thermodynamic properties with the wild-type (WT) but different kinetics and therefore different gene expression rate. Intuitively, the functional rate (e.g., gene expression rate in this case) is correlated with the opening of the SD sequence. If the SD sequence is opened longer, or has higher opening probability (i.e., having more nucleotides on the SD sequence open), then the mutant should have higher functional rate. We use our simulation method to study this opening probability during the folding process. Note that mutant CC3435AA has the longest duration at a relatively high level of opening probability while mutant SA has the shortest duration. This correlates with experimental data. The opening probability of U32C decreases earlier but finishes later than WT, so it is not clear which one has a larger total opening probability during folding, again matching experimental findings.
|Comparison of the SD opening probability during the folding process.|
The gene expression rate is determined from two factors: (1) how high the opening probability is at any given time and (2) how long the RNA stays in the high opening probability state. To compare each RNA quantitatively, we compute the integration of the opening probability over the whole folding process. Note that the RNA regulates gene expression only when the SD opening probability is ``high enough''. We used thresholds ranging from 0.1 to 0.6 to estimate the gene expression rate. Thresholds higher than 0.6 will yield zero opening probability on WT and most mutants and thus cannot be correlated to experimental results. The table below shows the results for the WT and for each mutant. For most thresholds, mutant CC3435AA has the highest rate and mutant SA has the lowest rate, the same relative functional rate as seen in experiment. In addition WT and mutant U32C have similar levels (particularly between 0.4-0.6), again correlating with experimental results. Aside from simply validating our method against experiment, we can also use our method to suggest that the SD sequence may only be active for gene regulation when more than 40% of its nucleotides are open.
|Mutant||Relative Expression Rate|
(with regard to WT)
A Motion Planning Approach to Studying Molecular Motions, Lydia Tapia, Shawna Thomas, Nancy M. Amato, Communications in Information and Systems, 10(1):53-68, 2010.
Intelligent Motion Planning and Analysis with Probabilistic Roadmap Methods for the Study of Complex and High-Dimensional Motions, Lydia Tapia, Ph.D. Thesis, Parasol Laboratory, Department of Computer Science, Texas A&M University, College Station, Texas, Dec 2009.
Ph.D. Thesis(pdf, abstract)
Simulating RNA Folding Kinetics on Approximated Energy Landscapes, Xinyu Tang, Shawna Thomas, Lydia Tapia, David P. Giedroc, Nancy M. Amato, Journal of Molecular Biology, 3811(4):1055-1067, Sep 2008.
Techniques for Modeling and Analyzing RNA and Protein Folding Energy Landscapes, Xinyu Tang, Ph.D. Thesis, Department of Computer Science and Engineering, Texas A&M University, Dec 2007.
Ph.D. Thesis(ps, pdf, abstract)
Tools for Simulating and Analyzing RNA Folding Kinetics, Xinyu Tang, Shawna Thomas, Lydia Tapia, Nancy M. Amato, In Proc. Int. Conf. Comput. Molecular Biology (RECOMB), pp. 268-282, San Francisco, CA, Apr 2007.
Proceedings(ps, pdf, abstract)
Using Motion Planning to Study RNA Folding Kinetics, Xinyu Tang, Bonnie Kirkpatrick, Shawna Thomas, Guang Song, Nancy M. Amato, Journal of Computational Biology, 12(6):862-881, Jul 2005. Also, In Proc. Int. Conf. Comput. Molecular Biology (RECOMB), pp. 252-261, San Diego, CA, Mar 2004.
Journal(ps, pdf, abstract) Proceedings(ps, pdf, abstract)
Supported by NSF
Project Alumni:Bonnie Kirkpatrick,Guang Song,Xinyu Tang,Lydia Tapia
Parasol Home | Research | People | General info | Seminars | Resources
Parasol Laboratory, 425 Harvey R. Bright Bldg, 3112 TAMU, College Station, TX 77843-3112
email@example.com Phone 979.458.0722 Fax 979.458.0718
Department of Computer Science and Engineering | Dwight Look College of Engineering | Texas A&M University
Privacy statement: Computer Science and Engineering Engineering TAMU
Web Accessibility Policy and Law - Web Accessibility and Usability Standards - Contact Webmaster