| Data analysis is the most critical step in the application of high-throughput sequencing in biological research.This paper focused on ovarian cancer protein mass spectrometry data and single-cell RNA sequencing data,and used computing tools to construct an effective classification / clustering algorithm,so as to deeply explore the biological information behind these data,help people understand biological functions,and assist in the treatment of related diseases.Ovarian cancer protein mass spectrometry data is complex and difficult to distinguish.Based on the cancer tissue in pathologic state expression of protein and peptide abundance changes,this paper puts forward an integrated model based on gauss mixture model and BP neural network,classifying the protein mass spectrogram,namely,the detection of the matching degree between the unknown sample and the known sample,and the prediction of the sample’s health status,benign development status,or cancer status,so as to assist the early diagnosis of ovarian cancer.Firstly,the integrated model removes some redundant data by means of curve analysis,and then USES variance sort to find the intersection,threshold setting and gaussian mixture model to screen out the feature mass-charge ratio set.Finally,BP neural network is used to carry out tri-classification prediction,and ten-fold crossover is used to verify the accuracy of the model.The results showed that the tri-classification accuracy of the ensemble model on Ovarian 04-03-02 public data set was significantly higher than obtained by independent gaussian mixture model and BP neural network.In addition,the integrated model is simple to calculate,has strong generalization and practicability,and can be used as an alternative tool for the follow-up research on the classification of mass spectrometry data.There are many kinds of single cell sequencing data.This paper proposes an improved algorithm based on the existing Corr algorithm.Firstly,“information contribution value”was extracted based on the characteristics of single cell sequencing data.Then the gene set of the information contribution matrix is calculated and incorporated into the gene set of the gene expression matrix in the Corr algorithm to obtain a new metric matrix.Finally,the cluster analysis of eight single-cell data sets was carried out using two algorithms.The results show that the clustering effect of the two algorithms is very good on the four data sets.In the two data sets of Islet and Human,the clustering accuracy of the improved algorithm is slightly worse.However,in N8-cell and Nmorale data sets,the clustering evaluation index of the improved algorithm is higher than that of Corr algorithm,and the optimal clustering number obtained is closer to the real number.It can be seen that the improved algorithm has high accuracy and strong robustness,which can provide certain technical support for clustering analysis of subsequent single-cell sequencing data. |