Cluster Analysis of Biological Data

Status: Available/Completed
Type: USRA/4080/408*…
Professor: Datta, Suprakash
Research Area: Bioinformatics
Year: 2015

Clustering is a basic analysis technique for data sets and many clustering algorithms have been developed over the years. However no single algorithm works well for all data sets. Biological data sets require modification of clustering algorithms to yield results that make sense to Biologists. More importantly, the large sizes of many Biological data sets make it infeasible to use many existing algorithms. In this project the student will learn about different clustering algorithms and implement some of them on real data sets and measure their performance.

The student will work closely with the supervisor and graduate students and learn about the challenges posed by large data sets as well as Biological context. This is an open ended project that may lead to publications if the student is creative.

Required background: Good programming skills, good algorithms knowledge, particularly the ability to read and understand existing algorithms.