Font Size: a A A

Clustering algorithm for mass spectrometry data using general-purpose computing on graphics processing units

Posted on:2017-03-18Degree:M.SType:Thesis
University:Illinois Institute of TechnologyCandidate:Ali, AnsabFull Text:PDF
GTID:2478390014498317Subject:Computer Engineering
Abstract/Summary:
Modern mass spectrometers can produce mass spectra data at a very high rate. Usually, this data has a significant percentage of redundant spectra that in- crease the database lookup time when searching for peptides. Therefore, there is a need for data-mining techniques (e.g. clustering) to reduce the complexity of these mass spectra datasets before database search. Multi-core architectures, specifically Graphics Processing Units (GPUs) have evolved tremendously in the recent years and are an ideal option for clustering these large mass spectra datasets. In this thesis, we present an efficient and scalable parallel algorithm for clustering mass spectra using the well known 'F-set' similarity metric. We describe the algorithmic framework and the various optimizations that serve to vastly improve the algorithm's performance and accuracy. We test the algorithm on a variety of real as well as self-generated mass spectra datasets and show that the algorithm achieves highly accurate clustering with performance gain of around 50 to 100 times as compared to serial implementations in literature. Thus, by clustering mass spectra corresponding to unique peptides together, the algorithm allows faster identification of peptides in a subsequent database search.
Keywords/Search Tags:Mass, Clustering, Data, Algorithm
Related items