Font Size: a A A

Biological Data Analysis Based On The Density Clustering And Convolution Neural Network

Posted on:2018-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:2348330536461198Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
In recent years,characteristics of biological data such as large scale,fast growth,including information rich have been more and more prominent,thereby analysis of biological data and requirements of information processing technology gradually increased.Among them,grasping the pattern of data and development trend,which used in the prediction of future data is an important objective of biological data analysis.In view of the high resolution mass spectrometry as related research data set in this paper,including images collected by 15 T Fourier transform-ion cyclotron resonance mass spectrometer,and the objects of imaging are various compounds present in the brain.The aim of this paper is to explore the information of the images contained different corresponding compounds and the development direction of the unknown data by means of machine learning technology.Due to the mass spectrum image has large amount of information but unobvious visual characteristics,it is difficult for the researchers to evaluate the information of image with reliable categories.In this case,this paper research on clustering analysis and classification based on the prior knowledge of a total of 892 mass spectrum images can be divided into 5 to 11 kinds of the study,the main work is as follows:(1)Analysis and feature extraction of mass spectrum images.A series of preprocessing of biological images based on depth analysis,including filtering,analysis of color feature and so on.Thereby a feature vector that characterizes the image information can be extracted.At the same time,display the analysis of the image intuitive and generate more valuable image features.(2)Cluster analysis of mass spectrum images based on the algorithm of density clustering,the preliminary classification of image types is realized.Using the extracted feature vector,the mass spectrum images without class labels are divided into multiple classes according to the research significance of mathematics and biology.On the basis of the density-based algorithm,the scheme of cluster centers selected automatic and the formula of threshold for abnormal point is proposed,so that the algorithm can automatically select the optimal number of clusters.The evaluation criteria for clustering corresponding to the results after clustering show that the center points selected by the algorithm mathematical significance is the most suitable.At the same time,the experimental results are robust to the choice of parameters and distance measurement criteria.The preliminary classification of the mass spectrum images is completed.(3)Classification and prediction of mass spectrum images.Based on the model of Convolutional Neural Network(CNN),the classification and recognition of mass spectrum images are realized for the first time.On the basis of the clustering results,through the artificial correction of the image class,most of the accurate images are selected and given the real class label.The feature extraction based on CNN is realized by using a total of 716 mass spectrum images with real labels.On the basis of CNN,support vector machine(SVM)is used to classify and verify the eigenvector of the full connection layer.The experimental results show that the accuracy of classification is 91.4% to 95.2% when a total of 450 images are selected as training data and the characteristics of different CNN output layers are used.The accuracy is more than 90% when the training set above 300,and the experimental results corresponding to different training set show a stable trend.Finally,a trained classifier model is obtained,and the purpose of rapid classification and prediction of mass spectrometry images is realized innovatively.
Keywords/Search Tags:Biological data, Cluster analysis, CNN, SVM
PDF Full Text Request
Related items