Clustering algorithm for mass spectrometry data using general-purpose computing on graphics processing units

Posted on:2017-03-18

Degree:M.S

Type:Thesis

University:Illinois Institute of Technology

Candidate:Ali, Ansab

Full Text:PDF

GTID:2478390014498317

Subject:Computer Engineering

Abstract/Summary:

Modern mass spectrometers can produce mass spectra data at a very high rate. Usually, this data has a significant percentage of redundant spectra that in- crease the database lookup time when searching for peptides. Therefore, there is a need for data-mining techniques (e.g. clustering) to reduce the complexity of these mass spectra datasets before database search. Multi-core architectures, specifically Graphics Processing Units (GPUs) have evolved tremendously in the recent years and are an ideal option for clustering these large mass spectra datasets. In this thesis, we present an efficient and scalable parallel algorithm for clustering mass spectra using the well known 'F-set' similarity metric. We describe the algorithmic framework and the various optimizations that serve to vastly improve the algorithm's performance and accuracy. We test the algorithm on a variety of real as well as self-generated mass spectra datasets and show that the algorithm achieves highly accurate clustering with performance gain of around 50 to 100 times as compared to serial implementations in literature. Thus, by clustering mass spectra corresponding to unique peptides together, the algorithm allows faster identification of peptides in a subsequent database search.

Keywords/Search Tags:

Mass, Clustering, Data, Algorithm

Related items

1	The Research Of Clustering Algorithms Based On Data Mass And Potential Entropy
2	Research On Clustering Methods Of Large Data Sets Based On Data Fields
3	Research And Implementation Of Mapreduce-based Graph Clustering Algorithm
4	Rapid Clustering Method Of Large-scale Internet Geographic Markers
5	Clustering Algorithm In Data Mining Research
6	Research On Intelligent Recommendation To Mass Customization In The Big Data Environment
7	Research Of User Behavior Analysis Method For Mass Power Consumption Data
8	Research And Simulation Of Clustering Algorithm In Data Mining
9	Research And Implementation Of Clustering Algorithm For Multidimensional Data Sets
10	Research On Personalized Recommendation Method Based On User Clustering