Home research People General Info Seminars Resources Intranet
| Algorithms & Applcations Group | Home | Research | Publications | People | Resources | News
Home Page for Aaron Lindsey | Parasol Laboratory


Picture Aaron Lindsey
Undergraduate
Algorithms & Applications Group

Parasol Laboratory url: http://parasol.tamu.edu/~alindsey/
Department of Computer Science and Engineering email:
Texas A&M University office: 407 HRBB
College Station, TX 77843-3112 tel:
USA fax: (979) 458-0718


CSE@TAMU REU Summer 2013 Research Journal

Week: 12345678910


Week 1 (June 3)

I had a lot of fun during the first week of the CSE@TAMU REU program. the first day, I met some really smart people from all over the world who are participating in research at Texas A&M this summer. We took a tour of Texas A&M's campus and learned all about the cool traditions that Aggies hold dear. Later in the week, I drafted a research proposal for my work this summer which can be found here. In summary, I will be working on improving decoy sets for protein folding simulations. Protein folding is the process by which a protein is transformed from a sequence of amino acids into its three-dimensional structure known as its native state. Simulating protein folding motions and predicting protein structures are two of the most important problems in computational biology because often the native structure of a protein cannot be determined experimentally. We care about these native structures because they give us insight into how diseases like Alzheimer's and Mad Cow disease develop and may possibly lead to new treatments for these diseases. My research is primarily concerned with decoy protein structures. A decoy is a structure that is very similar to the native state of a protein, but is not actually a real protein conformation. Scientists and engineers use sets of decoys to validate and improve their protein structure predition methods by testing if their methods can distinguish between a decoy and a real native structure. Numerous decoy sets exist, but currently there is not a way to analyze the quality of a decoy set or improve a set by adding structures. I would like to show several methods for improving the quality of decoy sets so that scientists and engineers can develop better protein folding algorithms that lead to new treatments for very harmful diseases.

Week 2 (June 10)

This week my primary goal was to develop a method for analyzing a decoy set, so that later we may show that our methods actually improved the quality of the set. I implemented a new analysis method in our protein folding code that takes a decoy set and a set of new candidate decoys that are generated using our methods and outputs plots of potential vs. RMSD for all of the decoy conformations. These plots show us how many candidates we have generated are actually viable to be added to the decoy set. Any candidates with low potential but higher RMSD or high potential but low RMSD are decoys that we would like to add to the set. I also implemented a strategy that generates a set of candidate decoys and another strategy that filters the candidate decoys based on each one's potential and RMSD from the native state.

Week 3 (June 17)

Now that I have a method for analyzing the quality of a decoy set, the next step in my research plan is to find a way to improve decoy sets by adding decoy conformations. In order to add conformations, I first needed to develop a method for generating sets of candidate decoys. During week 3, I developed several methods for generating these candidate sets that are based on the idea of perturbing a specific protein conformation to generate valid candidates. We say that a generated conformation is valid if its potential is below a given threshold. The methods are as follows.

  • Uniform sampling: This method simply generates random conformations by computing random phi/psi angles for all of the residues on the protein and keeps only valid conformations.
  • Native state biased sampling: In this method, we perturb or slightly change the phi/psi values of the native state in order to generate samples close to the native state.
  • Low energy decoy biased sampling: Similar to the native-biased sampling, we perturb valid structures to obtain more valid structures but instead of using the native state, we select the lowest energy decoy conformations to perturb.

Week 4 (June 24)

During this week, I focused on figuring out how to generate good sample sets of viable decoys and experimenting with the filtering of viable decoys after I had the sample sets. I changed my "low energy decoy biased sampling" strategy to perturb low-RMSD structures as well as low-energy structures. From my experiments, I found that a combination of all of my sampling strategies works the best.

Week 5 (July 1)

This week I focused on three main issues regarding the improvement of decoy sets. Until now, I had only improved the sets by adding new decoys. The first thing I did this week was experiment with removing redundant decoys from an existing decoy set in order to improve the set. After this, I added two new evaluation metrics that I then used to evaluate the quality of my improved decoy sets. Finally, I expanded the number of proteins that I am using to run experiments to 6 in order to see how my method performs across different kinds of proteins and decoy sets.

Week 6 (July 8)

This week my focus was on further refining my strategy for adding/removing decoys to/from a set and experimenting with the use of the EEF1 energy function instead of the coarse energy function to generate my sample sets and analyze decoy sets. In addition, I completed my 5-week evaluation and progress report for the REU program which I submitted to Ms. Roberts on Thursday. This week marks the end of the period in which I originally planned to be developing ways to improve decoy sets. From now on, I will continue to refine my strategies and work on evaluating the decoy sets in order to quantitatively show improvement.

Week 7 (July 15)

This week I continued trying to get the EEF1 function working, I refactored all of my code, and I ran experiments to determine the improvement and z-score of my improved decoy sets. Towards the end of the week, I ran experiments to determine the effect of sample set size on the improvement and z-score of the improved decoy sets.

Week 8 (July 22)

This week I continued experimenting with the optimal sample set sizes for the 4pti, 2cro, and 1gdm decoy sets. I also modified the decoy strategy to compute the filter multipliers based on the sample set sizes. The filter multipliers were previously user-defined values that determined how many standard deviations from the average a value could be for the decoy to be accepted into the final set. Adjusting these multipliers made the decoy strategy accept more or less samples based on potential and minimum distance. We wanted to make these values dependent on the sample set size so there would be less that the user would have to specify to use the algorithm. If a sample set is very small relative to the decoy set, we would like the algorithm to adjust the multipliers to make it easier to add samples and if the sample set is large we would like the adjust the multipliers in the opposite fashion.

Week 9 (July 29)

This week was spent working on the deliverables for the REU program which included a paper and a poster. I also ran experiments to add to my paper and poster.

Week 10 (August 5)

During the final week of the REU program, I finished up my deliverables and participated in two poster presentations. The presentations went great, and I felt like I was well prepared. The ending of the REU program was kind of bittersweet because all of the people I met had to leave to go back to their home institutions. I had a great experience this summer!