Font Size: a A A

Research On Identification Of Intestinal Flora Based On Data Mining Algorithm

Posted on:2019-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:X S YaoFull Text:PDF
GTID:2430330563457630Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
The current era is the age of biological big data.In medicine,many findings are based on biological data.With the development of medicine,people have gradually discovered that the effects of intestinal microbiota which are parasitic on the human body are beyond imagination.The ecosystem formed by various microbial communities of the gastrointestinal tract can physiologically maintain the normal physiological functions of the human body,however,it can also lead to human physiological diseases and even affect mental health.Due to this,the research on the intestinal microflora is increasingly heating up,and as the close relationship between the intestinal flora and the human body is gradually discovered,this area is becoming a promising topic.Data mining technology is slowly applied in the field of biological big data with its unique perspective.However,biological data often has the characteristics of large data volume and many attributes,which leads to the relatively simple research methods of these data.The data mining technology can excavate the implicit rules of a large number of data from multiple perspectives.For this reason,using data mining methods to study biological data has also become a very popular research method.The data of gut microbiota also contains the characteristics of the aforementioned biological data.In recent years,more and more cases have been done while using data mining methods to study intestinal microflora.The OTUs(Operational Taxonomic Units,an OTU is a collection of similar microorganisms)of the intestinal flora describe the OTUs abundance of the samples.There are thousands of OTUs in the intestines of an individual organism.Therefore,the use of data mining methods to study the intestinal flora data can yield findings that are difficult to obtain with traditional methods.The intestinal flora is associated with a variety of diseases,which is reflected in the differences in the intestinal microflora and the composition patterns of healthy people,for instance Diabetes is one of them.Using data mining methods to identify diseased people from a large sample of gut microbiota datasets is meaningful for disease-assisted diagnosis and disease screening.This paper uses the sample of the gut microbial population OTUs of diabetic patients as an example.Using genetic algorithm optimized neural network,support vector machine,and weighted LDA three kinds of data mining methods to classify and recognize the intestinal flora data of diabetic patients,diabetic patients with autonomic neuropathy and normal people.The work accomplished by this paper mainly includes the following parts:(1)Using the traditional BP neural network to classify the intestinal microbiota datasets of patients with diabetes,diabetic autonomic neuropathy,and normal persons.Then the above datasets were classified using the BP neural network optimized by genetic algorithms.The comparison of the classification accuracy of those two exhibited that the improved algorithm has greatly reduced the prediction error.When the threshold is 0.8,the classification accuracy reaches 90%,while the traditional BP algorithm has only 10% when the threshold is 0.8.(2)Using SVM to classify the above data,it is found that the SVM has a better classification effect on the data set.The classification accuracy of the single classification experiment is 80%,and the operating efficiency is higher than that of the BP algorithm improved by the genetic algorithm.(3)Classifying the above data according to the traditional LDA topic model algorithm,and then design a new weighting method based on the idea of mutual information to improve the LDA topic model to gain more accuracy in its classification.The experimental results show that the traditional LDA topic model has a normal classification accuracy for data sets.When the threshold is 0.7,the classification accuracy rate is 60%.The improved LDA topic model has a 10% increase in the maximum conditional probability of classification compared with the traditional one.When the threshold is 0.7,the classification accuracy reaches 100%.At the same time,weight LDA can be used to generate weight matrix,which can be used to study microbial communities which have great influence on classification.This paper verifies the accuracy of several commonly used data mining methods for the classification of the intestinal flora OTUs data sets,at the same time do some improvement based on them to get a more precise classification.Meanwhile,the study provides a more effective method for disease assisted diagnosis and disease screening and gives a clue to the classification of intestinal flora data.In a word,it has a certain practical significance.
Keywords/Search Tags:Intestinal flora, LDA topic model, Genetic algorithm, BP neural network, Support vector machine
PDF Full Text Request
Related items