Home research People General Info Seminars Resources Intranet
| Algorithms & Applcations Group | Home | Research | Publications | People | Resources | News
Algorithms & Applications Group
Using Motion Planning to Study Protein Folding and Motion

Despite the explosion in protein structural and functional data, our understanding of protein and movement is still very limited. Experimental methods cannot operate at the time scales necessary to record protein folding and motions, and traditional simulation techniques such as molecular dynamics and Monte Carlo methods are too computationally expensive to simulate long enough time periods for anything other than small peptide fragments. However, it is critical that we better understand protein motion and the folding process for several reasons: understanding the folding process can give insight into how to develop better structure prediction algorithms, treatments for diseases such as Alzheimer's and Mad Cow disease can be found by studying protein misfolding, and many biochemical processes are regulated by protein motion.

Our technique for computing protein folding pathways and protein motions is based on the successful probabilistic roadmap (PRM) method for robotics motion planning. We were inspired to apply PRMs to protein folding based on our sccess in applying it to paper folding. The figure below demonstrates the parallelism between paper folding and protein folding.

some text some text some text some text
some text some text some text some text

Folding snapshots of a periscope (movie) and staphylococcal Protein A, B domain, computed by our technique. Also available is a folding box movie.

We have obtained promising results for several small proteins (about 60-100 amino acids), and we have validated our pathways by comparing their secondary structure formation order with known experimental results, examining folding rate, and studying population kinetics. A major feature of our technique is that it produces many folding pathways in just a few hours on a desktop PC. These pathways provide global information about the protein's energy landscape.

In order to study larger proteins and more challenging protein motion problems, we applied rigidity analysis to bias our sampling in a more physically realisitic way. Given a protein structure, we use rigidity analysis to identify the rigid and flexible components of the protein. We sample the flexible regions with a high probability and sample the rigid regions with low probability. We then performed a detailed study of protein G (streptococcal protein G, B1 immunoglobulin-binding domain), protein L (peptostreptococcal protein L, B1 immunoglobulin-binding domain), and two mutated forms of protein G, NuG1 and NuG2, to further validate our technique. These proteins provide a good test case for out technique because they are known to fold differently even though they have similar structure:

Protein G Protein L NuG1 NuG2
Native 3D structures of proteins G, L, NuG1, and NuG2. Mutated residues in NuG1 and NuG2 are indicated in wireframe.

All four proteins are composed of an alpha-helix (green) and two beta-hairpin 1 (orange) and hairpin 2 (blue). Native state out-exchnage experiments, pulse labeling/competition experiments, and phi-value analysis indicate that hairpin 1 forms first in protein L and hairpin 2 forms first in protein G. Protein G was then mutated by submitting a few residues in hairpin 2 to decrease its relaitve stability. Phi-value analysis indicates that the mutations switch the hairpin formation order from the wildtype to that of the protein L. We were able to successfully replicate these results with out technique. For each protein, we analyzed hundreds of folding pathways. The following table shows out simulation results:

some text
some text

Comparison of secondary structure formation order for proteins G, L, NuG1, and NuG2 with experimental results.

In addtion to detecting the correct folding behavior of each protein, our rigidity-based technique can also help explain the stability shift in NuG1 and NuG2 from the wildtype. Consider their native state rigidity maps:

Protein G Protein L NuG1 NuG2
Rigidity maps of the native structure for proteins G, L, NuG1, and NuG2.

A rigidity map is similar to a contact map except residues pairs are marked according to their rigidity relationship: black if they are in the same rigid component, green if they are in the same dependently flexible set, and white otherwise. In all four proteins, the central alpha-helix remains completely rigid, as expected. We also increased rigidity from our protein G to NuG1 and NuG2 in hairpin 1 as suggested by experiment. For the other regions, the rigidity maps show more similarity between the mutated forms of protein G and protein L than with protein G, reflecting the similarity/difference in folding behaviors.

Our roadmaps provide an approxiamte view of a protein's folding landscape. In our recent work, we have explored various methodologies for studying the kinetics of protein folding in these approximate models. Our two methodologies, map-based Master Equation (MME) and map-based Monte Carlo (MMC), allow us to study two important features of protein folding kinetics: relative folding rate and population kinetics. While master equation calculation and Monte Carlo simulation have been performed on protein models before, their computational cost have prohibitied them from being used for large protein models. An important benefit of our map-based apporximation approach is that it enables us to study the kinetics of much larger proteins than can be handled by traditional master equation solution or Monte Carlo simulation.

We can use the MME approach to study relative folding rate as extracted from our roadmaps. For this study, we looked at Protein G and its variants NuG1 and NuG2. Experimental studies have shown that these two mutants folds 100 times faster than protein G. We applied the MME approach to study the relative folding rates of these proteins.

some text some text

In the figure above, the simulated folding rates are shown for Protein G and its mutants NuG1 and NuG2. The plotted eigenvalues were calculated using MME. Note that the smalled non-zero eigenvalues correspond to the folding rate (index 2). Our simulations show that Protein G folds much slower than its mutants NuG1 and NuG2. This is what has been seen in lab experiments.

Also shown above is the performance of MME roadmaps ranging in size from 2000 to 15000 nodes. The running time of MME scales linearly with roadmap size (i.e. the size of the landscape model).

  Population Kinetics Tryptophan Contact Formation Helix Formation
Protein ACBP

Displayed in the figure above is the 86 residue all alpha protein, Acyl-coenzyme A Binding Protein (ACBP), we have studied through the MMC technique. This protein has five helices and two tryptophans in the core of the protein. The folding of ACBP has been studied in the lab thourgh tryptophan fluorescence, and it has been shown that it has a fast, two-state folder. From our MMC kinetics, we see that ACBP exhibits a property similar to otehr all alpha proteins we have studied: continual formation of helix contacts and reaching the native state after teh formation of many helix contacts. Also, since ACBP has two tryptophans in the core of the protein, we see a quick increase in the formation of these contacts around the same time we see the native state beginning to be populated, around time step 100. This could correspond to the packing of the structure and the formation of long-range interactions in the core of the protein. We also show this protein has similar kinetics to other all alpha-proteins and other lab experiments on alpha proteins. Through the technique we have been able to study population kinetics for proteins of various lengths and mixed structures.

We have also used our roadmaps, or approxiamte landscape models, to simulate relative hydrogen exchange rates and infer the folding core. The folding core is the set of residues that are the first to form structure during folding and the last to lose structure during denaturation. One of the most prevalent experimental methods to infer the folding core is hydrogen exchange. It measures the rate at which residues exchange their hydrogens for Deuterium during either folding or unfolding.

We use rigidity analysis to examine the folding pathways on our roadmaps (typically extracted by MMC) and compute an approxiamte exchange rate for each residue. We developed two approaches for approximating exchange: one looks at the indiviual residue's rigidity/flexibility and one looks at the formation of mutually rigid, non-local subsequences. We found good correlation between the residues identified experimentally as part of the folding core and residues we identify with slow exchange rates:

Continuous Labeling EX [Hilton78] Continuous Labeling EX [Richarz79] Pulse Labeling EX [Roder86] Simulated Relative EX
(by rigidity/flexibility)
Simulated Relative EX
(by mutually rigid subsequences)

Adaptive Local Learning in Sampling Based Motion Planning for Protein Folding, Chinwe Ekenna, Shawna Thomas, Nancy Amato, Bio Med Central Systems Biology, 10(2):165--179, Aug 2016. Also, In The IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 61-68, Washington DC, USA, Nov 2015.
Journal(pdf, abstract) Proceedings(pdf, abstract)

Improved Sampling Based Motion Planning Through Local Learning, Chinwe Ekenna, Ph.D. Thesis, Department of Computer Science and Engineering, Texas A&M University, Aug 2016.
Ph.D. Thesis(pdf, abstract)

Adaptive Neighbor Connection Aids Protein Motion Modeling, Chinwe Ekenna, Shawna Thomas, Nancy Amato, In Proc. RSS Workshop on Robotics Methods for Structural and Dynamic Modeling of Molecular Systems, Jul 2014.
Proceedings(ps, pdf, abstract)

Rigidity Analysis for Protein Motion and Folding Core Identification, Shawna Thomas, Lydia Tapia, Chinwe Ekenna, Hsin-Yi (Cindy) Yeh, Nancy M. Amato, In Proc. of 2013 AAAI Wkshp. on Art. Int. and Robot. Meth. in Comp. Bio., Bellevue, WA, Jul 2013.
Proceedings(pdf, abstract)

A Multi-Directional Rapidly Exploring Random Graph (mRRG) for Protein Folding, Shuvra Nath, Shawna Thomas, Chinwe Ekenna, Nancy M. Amato, In ACM Conference on Bioinformatics, Computational Biology and Biomedicine, pp. 44-51, Orlando, FL, USA, Oct 2012.
Proceedings(ps, pdf, abstract)

A Motion Planning Approach to Studying Molecular Motions, Lydia Tapia, Shawna Thomas, Nancy M. Amato, Communications in Information and Systems, 10(1):53-68, 2010.
Journal(pdf, abstract)

Rigidity Analysis for Modeling Protein Motion, Shawna Thomas, Ph.D. Thesis, Department of Computer Science and Engineering, Texas A&M University, May 2010.
Ph.D. Thesis(ps, pdf, abstract)

Intelligent Motion Planning and Analysis with Probabilistic Roadmap Methods for the Study of Complex and High-Dimensional Motions, Lydia Tapia, Ph.D. Thesis, Parasol Laboratory, Department of Computer Science, Texas A&M University, College Station, Texas, Dec 2009.
Ph.D. Thesis(pdf, abstract)

Using Dimensionality Reduction to Better Capture RNA and Protein Folding Motions, Lydia Tapia, Shawna Thomas, Nancy M. Amato, Technical Report, TR08-005, Parasol Laboratory, Department of Computer Science, Texas A&M University, College Station, Texas, U.S.A., Oct 2008.
Technical Report(ps, pdf, abstract)

Techniques for Modeling and Analyzing RNA and Protein Folding Energy Landscapes, Xinyu Tang, Ph.D. Thesis, Department of Computer Science and Engineering, Texas A&M University, Dec 2007.
Ph.D. Thesis(ps, pdf, abstract)

Kinetics Analysis Methods For Approximate Folding Landscapes, Lydia Tapia, Xinyu Tang, Shawna Thomas, Nancy M. Amato, In Int. Conf. on Int. Sys. for Mol. Bio. (ISMB)/European Conf. on Comp. Bio.(ECCB), Vienna, Austria, Jul 2007. Also, Bioinformatics, 23(13):i539-i548, Jul 2007.
Journal(pdf, abstract)

Simulating Protein Motions with Rigidity Analysis, Shawna Thomas, Xinyu Tang, Lydia Tapia, Nancy M. Amato, Journal of Computational Biology, 14(6):839-855, Jul 2007. Also, In Proc. Int. Conf. Comput. Molecular Biology (RECOMB), pp. 394-409, Apr 2006.
Journal(ps, pdf, abstract) Proceedings(ps, pdf, abstract)

Parallel Protein Folding with STAPL, Shawna Thomas, Gabriel Tanase, Lucia K. Dale, Jose M. Moreira, Lawrence Rauchwerger, Nancy M. Amato, Concurrency and Computation: Practice and Experience, 17(14):1643-1656, Dec 2005.
Journal(ps, pdf, abstract)

Protein Folding by Motion Planning, Shawna Thomas, Guang Song, Nancy M. Amato, Physical Biology, 2:S148-S155, Nov 2005.
Journal(ps, pdf, abstract)

Parallel Protein Folding with STAPL, Shawna Thomas, Nancy M. Amato, In Proc. IEEE Int. Wkshp. on High Performance Computational Biology, Santa Fe, NM, Apr 2004.
Proceedings(ps, pdf, abstract)

A Motion Planning Approach to Folding: From Paper Craft to Protein Folding, Guang Song, Nancy M. Amato, IEEE Transactions on Robotics and Automation, 20(1):60-71, Feb 2004. Also, In Proc. IEEE Int. Conf. Robot. Autom. (ICRA), pp. 948-953, Seoul, Korea, May 2001. Also, Technical Report, TR00-017, Department of Computer Science and Engineering, Texas A&M University, Jul 2000.
Journal(ps, pdf, abstract) Proceedings(ps, pdf, abstract) Technical Report(ps, pdf, abstract)

A Motion Planning Approach to Protein Folding, Guang Song, Ph.D. Thesis, Parasol Laboratory, Department of Computer Science, Texas A&M University, Dec 2003.
Ph.D. Thesis(ps, abstract)

Using Motion Planning to Map Protein Folding Landscapes and Analyze Folding Kinetics of Known Native Structures, Nancy M. Amato, Ken Dill, Guang Song, Journal of Computational Biology, 10(3-4):239-255, Jun 2003. Also, In Proc. Int. Conf. Comput. Molecular Biology (RECOMB), pp. 2-11, Apr 2002.
Journal(ps, pdf, abstract) Proceedings(pdf, abstract)

A Path Planning-based Study of Protein Folding With a Case Study of Hairpin Formation in Protein G and L, Guang Song, Shawna Thomas, Ken A. Dill, J. Martin Scholtz, Nancy M. Amato, In Proc. Pac. Symp. of Biocomputing (PSB), pp. 240-251, Lihue, HI, Jan 2003.
Proceedings(ps, pdf, abstract)

Using Motion Planning to Study Protein Folding Pathways, Guang Song, Nancy M. Amato, Journal of Computational Biology, 9(2):149-168, Nov 2002. Also, In Proc. Int. Conf. Comput. Molecular Biology (RECOMB), pp. 287-296, Apr 2001. Also, Technical Report, TR00-026, Parasol Laboratory, Department of Computer Science, Texas A&M University, Oct 2000.
Journal(ps, pdf, abstract) Proceedings(ps, pdf, abstract) Technical Report(ps, pdf, abstract)

Using Motion Planning to Map Protein Folding Landscapes and Analyze Folding Kinetics of Known Native Structures, Nancy M. Amato, Guang Song, Technical Report, TR01-001, Parasol Laboratory, Department of Computer Science, Texas A&M University, Oct 2001.
Technical Report(ps, pdf, abstract)

A Motion Planning Approach to Folding: From Paper Craft to Protein Structure Prediction, Guang Song, Nancy M. Amato, Technical Report, TR00-001, Department of Computer Science and Engineering, Texas A&M University, Jan 2000.
Technical Report(ps)

Probabilistic Roadmap Methods are Embarrassingly Parallel, Nancy M. Amato, Lucia K. Dale, In Proc. IEEE Int. Conf. Robot. Autom. (ICRA), pp. 688-694, Detroit, Michigan, USA, May 1999.
Proceedings(ps, pdf, abstract)

Supported by NSF

Project Alumni:Luke Hunter,Bonnie Kirkpatrick,Katarzyna Leyk,Kasra Manavi,Shuvra Nath,Guang Song,Annette Stowasser,Xinyu Tang,Lydia Tapia,Manasi Vartak

Visit our Protein Folding Server!
Our Protein Folding Server is available to analyze your proteins with our motion planning technique.