Classification/Clustering Algorithm Based On Ovarian Cancer Protein Mass Spectrometry And Single Cell Sequencing

Posted on:2021-10-14

Degree:Master

Type:Thesis

Country:China

Candidate:J S Ma

Full Text:PDF

GTID:2544306104467044

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

Data analysis is the most critical step in the application of high-throughput sequencing in biological research.This paper focused on ovarian cancer protein mass spectrometry data and single-cell RNA sequencing data,and used computing tools to construct an effective classification / clustering algorithm,so as to deeply explore the biological information behind these data,help people understand biological functions,and assist in the treatment of related diseases.Ovarian cancer protein mass spectrometry data is complex and difficult to distinguish.Based on the cancer tissue in pathologic state expression of protein and peptide abundance changes,this paper puts forward an integrated model based on gauss mixture model and BP neural network,classifying the protein mass spectrogram,namely,the detection of the matching degree between the unknown sample and the known sample,and the prediction of the sample’s health status,benign development status,or cancer status,so as to assist the early diagnosis of ovarian cancer.Firstly,the integrated model removes some redundant data by means of curve analysis,and then USES variance sort to find the intersection,threshold setting and gaussian mixture model to screen out the feature mass-charge ratio set.Finally,BP neural network is used to carry out tri-classification prediction,and ten-fold crossover is used to verify the accuracy of the model.The results showed that the tri-classification accuracy of the ensemble model on Ovarian 04-03-02 public data set was significantly higher than obtained by independent gaussian mixture model and BP neural network.In addition,the integrated model is simple to calculate,has strong generalization and practicability,and can be used as an alternative tool for the follow-up research on the classification of mass spectrometry data.There are many kinds of single cell sequencing data.This paper proposes an improved algorithm based on the existing Corr algorithm.Firstly,“information contribution value”was extracted based on the characteristics of single cell sequencing data.Then the gene set of the information contribution matrix is calculated and incorporated into the gene set of the gene expression matrix in the Corr algorithm to obtain a new metric matrix.Finally,the cluster analysis of eight single-cell data sets was carried out using two algorithms.The results show that the clustering effect of the two algorithms is very good on the four data sets.In the two data sets of Islet and Human,the clustering accuracy of the improved algorithm is slightly worse.However,in N8-cell and Nmorale data sets,the clustering evaluation index of the improved algorithm is higher than that of Corr algorithm,and the optimal clustering number obtained is closer to the real number.It can be seen that the improved algorithm has high accuracy and strong robustness,which can provide certain technical support for clustering analysis of subsequent single-cell sequencing data.

Keywords/Search Tags:

Mass spectrometric data for ovarian cancer, gaussian mixture model, BP neural network, tri-classification, single cell clustering, the evaluation index, robustness

PDF Full Text Request

Related items

1	Application Of Denoising Autoencoder Combined With Gaussian Mixture Clustering In Cancer Subtype Classification Of Multi-omics Data
2	A Study On Cancer Typing Based On Spectral Clustering Algorithm
3	Research On Mixture Model Based Clustering Of Cancer Omics Data
4	Detection Of Carotid Intima And Media Thicknesses Based On Ultrasound B-mode Images Clustered With Gaussian Mixture Model
5	Application Of Protein Mass Spetrum Data And Neural Network In Diagnosis Of Ovarian Cancer
6	Research And Application Of Classification Of Small Sample Clinical Data
7	Research On Association Classification Algorithm And Its Application In Medical Data
8	Research On Automatic Analysis Method Of Flow Cytometry Data Based On Neural Network Model
9	Mass spectrometric studies of single- and double-stranded oligodeoxynucleotides
10	The Research Of Human Brain MRI Segmentation Methods Based On Gaussain Mixture Model,Markov Random Field And Fuzzy Clustering