An Incremental Clustering Algorithm For Proteomics Spectrometry Based On Deep Embedding Model

Posted on:2022-08-16

Degree:Master

Type:Thesis

Country:China

Candidate:B G Zhang

Full Text:PDF

GTID:2480306575466614

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Proteomics is a new discipline that systematically studies the composition and function of proteins.It mainly completes the qualitative and quantitative operation of proteins through the process of enzymatic hydrolysis,separation and protein sequence /spectrum library search.In shotgun proteomics experiments,there are usually some problems,such as repeated search caused by highly redundant data,and the candidate library can not contain too many post-translational modifications,which results in many spectra that can not be identified.Spectral clustering algorithm can make up for these defects: spectral clustering can remove the redundancy by clustering the redundant spectra and reduce the matching calculation in the search database;it can verify the existing identifications twice by clustering to identify the wrong identifications;it can also realize the new identifications of the unidentified spectra in the cluster,and construct the spectral database.However,existing clustering algorithm can't search the new data quickly,because there are not many new data which is inefficient.In view of the above shortcomings,this thesis makes the following research:1.This thesis studies IGLEAMS(Increment LEArning based MS/MS Spectra)which is an incremental clustering model based on deep embedding model and based on advanced GLEAMS(Learned Embedding for Annotating Mass Spectra)deep embedding model.Firstly,it merges the new data with the existing clustering database through faiss database index.Secondly,it uses local search strategy to search the k-nearest neighbor of the new data on the merged index.Then,it uses inverted filtering and single point insertion methods to combine the new data cluster and the existing cluster,which realizes a new combined cluster.Finally,the incremental clustering is completed by removing the duplication of the spectral data.The experimental results show that IGLEAMS improves the efficiency of clustering time performance by about 40% compared with GLEAMS,and the clustering speed is fast;while the clustering results are highly consistent with GLEAMS.2.The spectral data association model is designed,and the index is created by faiss database to realize the association between data in details: first,the storage model of original spectral data,dimension reduction data and cluster data is designed;second,according to the characteristics of faiss index,the association between different types of data is designed;finally,the data is stored in the database to realize the fast search between data types.3.The visualization display system is designed.IGLEAMS clustering module,clustering result display module and data search module are designed and developed to complete the construction of IGLEAMS clustering system based on Python and flash framework.

Keywords/Search Tags:

proteomics, mass spectrometry, deep embedding model, incremental clustering, faiss

PDF Full Text Request

Related items

1	Research On Algorithm For Mass Specerometry Based Proteomics
2	Modeling Of Peptide Fragment Ion Intensities In Tandem Mass Spectrometry
3	Studies On The New Technologies And Methods Of Biological Mass Spectrometry And Their Applications For Proteomics
4	1.Tandem Mass Spectrometry Application On Proteomics 2.The Structure And Function Studies Of JingZhaotoxin-Ⅶ
5	By Maldi Mass Spectrometry Of New Methods And New Technology And Its Application In Proteomics Research
6	Bioinformatics of high throughput proteomics using tandem mass spectrometry of intact proteins
7	Processing Of Large-scale Mass Spectrometry Data And Construction Of Corresponding Platform
8	Studying And Development Of New Methods For Mass Spectrometry Data Analysis In Proteomics
9	Research On Quantitative Proteomics Method For DIA Mass Spectrometry Data
10	Methylome Analysis Based On Mass Spectrometry