Font Size: a A A

The Research On High-speed Spectral Library Search Algorithm Based On GC-MS

Posted on:2016-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:L L CaoFull Text:PDF
GTID:2308330461988752Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Compound identification mainly by means of mass spectrum library similarity search. In recent years, with the number of available mass spectrum data rapid increase in the mass spectral library, this requires a high compound identification rate, faster mass spectral library search algorithm.In this paper, a variety of similarity measure algorithms are combined and their complementarities are also used to improve compounds identification rate, namely this algorithm id defined as combine multiple mass spectral similarity measure algorithm (MUL_SM).The proposed algorithm combines seven different similarity measures (ABS_VD, Euclidean distance, Cosine correlation, correlation, SS composite similarity measure, DFTR composite similarity measure, DWTD composite similarity measure) and particle swarm optimization algorithm (PSO) is used to set the weight value of seven different similarity measures in MUL_SM algorithm (that is, the contribution rate of the similarity measures).At the same time, in order to reduce the search time(computational time), choosing a metric algorithm from the seven similarity measure algorithms as the "filter" (to build a search main library).By comparing the library similarity search computational time, identification and molecular structure interpretation ability of seven similarity measure algorithms, finally we selected absolute value difference (ABS_VD) similarity measure algorithm as a "filter" of the original search library.Traditional mass spectral library search data is based on the original gas chromatography-mass spectrometry (GC-MS) data, the dimensions of the sample is more higher,which leads to the speed of library search is relatively slow, resulting in the library search efficiency is not high. We proposed a random projection Location-sensitive hash based library search algorithm, the algorithm implemented a faster and high-efficiency mass spectral library search.In this work, it mainly includes the following several aspects:1.In view of the traditional single similarity measure algorithms had low recognition rate,so combine multiple mass spectral similarity measure algorithm is proposed. The main idea of MUL_SM is as follows:First is the diversity of the seven different similarity measures for visualization. Second is to determine the weight value of the seven different similarity measure by the particle swarm optimization algorithm (PSO) in the MUL_SM algorithm, namely the choice of weights. Experiments show that the MUL-SM algorithm compared with seven kinds of common similarity measure method has higher compound identification performance.2.. The random projection Location-sensitive hash based library search algorithm mainly consists of two parts:the one hand is the original GC-MS data mapping into binary data, and the other hand is based on the binary data library search. The experimental results show that the proposed algorithm is compared with the traditional mass spectral library search algorithm, its search speed has a distinct advantage.
Keywords/Search Tags:Similarity measure, Mass spectral library search, Combine multiple similarity measure, Location-sensitive hash
PDF Full Text Request
Related items