Imputation of missing values in microarray data

Student: Michael Larin

Supervisor: S. Datta


Microarrays are a relatively new technology that have had tremendous impact on many areas within biology and bioinformatics. Microarray technology enables researchers to study the behaviour of many genes and/or conditions in a single experiment.

Due to technological limitations and experiment design issues, microarray data sets typically have several missing values. It has been shown that imputation of these values improves the accuracy of different processing tasks, including clustering, that are typically done on these data sets. Therefore, good imputation algorithms are required.

In this project, we will explore fast and accurate imputation algorithms for microarray data. The student will first read the papers assigned and write a short summary of them. Then, he will study the performance a few algorithms from the literature (many algorithms are already implemented but 1 – 2 may need to be implemented). Finally, he will work with the supervisor on the design of better algorithms for the problem being studied. He will use publicly available data sets to compare the performance (accuracy and speed) of the new algorithm(s) to the GMCImpute algorithm and several other existing ones.

Throughout the course, the student is required to maintain a course website to report any progress and details about the project.