Font Size: a A A

An Incremental Clustering Algorithm For Proteomics Spectrometry Based On Deep Embedding Model

Posted on:2022-08-16Degree:MasterType:Thesis
Country:ChinaCandidate:B G ZhangFull Text:PDF
GTID:2480306575466614Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Proteomics is a new discipline that systematically studies the composition and function of proteins.It mainly completes the qualitative and quantitative operation of proteins through the process of enzymatic hydrolysis,separation and protein sequence /spectrum library search.In shotgun proteomics experiments,there are usually some problems,such as repeated search caused by highly redundant data,and the candidate library can not contain too many post-translational modifications,which results in many spectra that can not be identified.Spectral clustering algorithm can make up for these defects: spectral clustering can remove the redundancy by clustering the redundant spectra and reduce the matching calculation in the search database;it can verify the existing identifications twice by clustering to identify the wrong identifications;it can also realize the new identifications of the unidentified spectra in the cluster,and construct the spectral database.However,existing clustering algorithm can't search the new data quickly,because there are not many new data which is inefficient.In view of the above shortcomings,this thesis makes the following research:1.This thesis studies IGLEAMS(Increment LEArning based MS/MS Spectra)which is an incremental clustering model based on deep embedding model and based on advanced GLEAMS(Learned Embedding for Annotating Mass Spectra)deep embedding model.Firstly,it merges the new data with the existing clustering database through faiss database index.Secondly,it uses local search strategy to search the k-nearest neighbor of the new data on the merged index.Then,it uses inverted filtering and single point insertion methods to combine the new data cluster and the existing cluster,which realizes a new combined cluster.Finally,the incremental clustering is completed by removing the duplication of the spectral data.The experimental results show that IGLEAMS improves the efficiency of clustering time performance by about 40% compared with GLEAMS,and the clustering speed is fast;while the clustering results are highly consistent with GLEAMS.2.The spectral data association model is designed,and the index is created by faiss database to realize the association between data in details: first,the storage model of original spectral data,dimension reduction data and cluster data is designed;second,according to the characteristics of faiss index,the association between different types of data is designed;finally,the data is stored in the database to realize the fast search between data types.3.The visualization display system is designed.IGLEAMS clustering module,clustering result display module and data search module are designed and developed to complete the construction of IGLEAMS clustering system based on Python and flash framework.
Keywords/Search Tags:proteomics, mass spectrometry, deep embedding model, incremental clustering, faiss
PDF Full Text Request
Related items