Computational Biology


Secondary Structure Prediction of RNA

Overview

This project started as a research project required for my Mathematics major. I participated in a summer undergraduate research experience at Georgia Tech working under Dr. Christine Heitsch from the Department of Mathematics. The projects explored alternatice metrics and prediction techniques for RNA secondary structures using: Geometric Combinatorics, Abstract Algebra, Dynamic Programming, Machine Learning and Micro Biology. I implemented a machine learning algorithm using an unambiguous stochastic context free grammar in Java and C++.

The results spurred interest in the project, raising questions and providing the hope to improve othe prediction algorithms based by shedding light on their short commings. We are currently building on the current project investigating the range in accuracy for prediction of RNA with similar structures but different sequences.

I recently received first place for judge's choice in a poster session at Georgia Tech's College of Computing. Thanks to my faculty advisers (listed below) for all their hard work and patience.

Abstract

Esposito, D., Dr. Heitsch, C. E., Dr. Poznanovik, S. and Dr. Swenson, M. S.
Improved RNA Secondary Structure Prediction Using Stochastic Context Free Grammars.  
Department of Mathematics, Georgia Institute of Technology, Atlanta, GA.
Accurate RNA secondary structure prediction is an important problem in computational biology. Different RNA nucleotide sequences often fold to similar structures causing current prediction algorithms to range widely in accuracy for RNA strands with similar structures. To understand the origins of these inaccuracies we trained a stochastic context free grammar on a hard-to-predict training set and an easy-to-predict training set which corresponds to a set of sequences with low and high prediction accuracy respectively. We found interesting statistical differences in the nucleotide composition of the sequence as well as the distribution of nucleotide base pairs between the two training sets. Stochastic context free grammars provide a means to quantify subtle difference in the composition of native secondary structures. The discovery of these differences could potentially lead to the improvement of current prediction algorithms. We are currently performing a parametric analysis of several prediction methods.

My work has been presented at the following symposiums:

NimBios Undergraduate Research Conference
University of Tennessee - Knoxville, TN
October 21 - 22, 2011
Poster Presentation
Appreciation is extended to NimBios and the University of Tennessee for all funding provided.

Georgia Tech Mathematics Annual Alumni Meeting
Georgia Institute of Technology - Atlanta, GA
October 28, 2011
Poster Session

Mathematical Association of America Joint Mathematics Meetings
Hynes Convention Center - Boston, MA
January 4-7, 2012
Poster Session
Appreciation is extended to the Mathematical Association of America and the American Mathematical Society for all funding provided.

Discrete and Topological Modeling in Molecular Biology
University of Southern Florida - Tampa, FL
March 12-14, 2012
Poster Session
Appreciation is extended to the University of Central Florida and the National Science Foundation for all funding provided.

Undergraduate Research Opportunities Program Spring Sympossium (UROP)
Georgia Institute of Technology
April 10, 2012
Talk

Undergraduate Research Opportunities in Computing Spring Sympossium (UROC)
Georgia Institute of Technology
April 10, 2012
Poster Session (1st Prize for Judges Choice)
Appreciation is extended to the sponsors including Yahoo! and Ken Schmidt.


Faculty Advisers and Useful Links

Georgia Institute of Technology
Department of Mathematics
Department of Biology
Dr. Christine Heitsch
Dr. M Shel Swenson
Dr. Svetlana Poznanovik

Project Home Page - A brief description of the project and information about each researcher in the Mathematics department.

Project Paper - My conclusions paper after the implementation of a machine learning algorithm using an unambiguous stochastic context free grammar for RNA secondary structure prediction.

Project Notes from 2011 Summer REU - A collection of notes covering each topic studied, group meetings and progress as the REU progressed.



No comments:

Post a Comment