Font Size: a A A

Research And Implementation Of Classification Model On Big Data In Healthcare Based On Semi-supervised Learning Algorithm

Posted on:2019-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:X H TangFull Text:PDF
GTID:2348330563453968Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous development of the medical informatization,health data are rapidly increasing and the traditional health industry is gradually entering the era of big data.Analyzing and processing the health data is not only able to predicate disease,provide auxiliary diagnosis and decision support,but also infinitely replicate and reasonably distribute the limited medical resources to improve the quality and the efficiency of the medical services.Therefore,research on big data in healthcare has become a hot issue which is concerned by all people.Classification is one of the commonly used methodologies for processing health data.The essence of classifying health data is to distinguish and merge the data according to certain attributes.Because of the medical terminology in the health data,it is difficult to obtain a large amount of labeled data;on the contrary it is relatively easy to obtain unlabeled health data.Therefore,this thesis combines the semi-supervised learning algorithm to study the classification model of the big data in healthcare.The main contribution of this work is highlighted as follows:(1)Study and improve the classification model toward the medical physical examination data based on the self-training algorithm.The medical physical examination data is usually structured data with lower dimension and unified format.When using the self-training algorithm to classify data,it is easy to introduce mislabeled samples into the training set,which would weaken the performance of the classifier.Therefore,the strategy of repeatedly labeling the unlabeled sample is proposed in this thesis to improve the self-training algorithm.Then taking liver function test data as an example to construct a liver disease classification model.Experiments show that the self-training algorithm after optimization has better performance in classification on hepatopathy.(2)Study and improve the classification model based on co-training algorithm toward the medical record data.The medical record data is complex and semi-structured,So the data should be converted into a structured format before the classification model is established.Then the Tri-training,one of the representative co-training algorithm is studied to classify the medical record data.However,Tri-training selects unlabeled samples by strategy of implicit estimation,which may cause the inaccuracy in the selection result,so the discourse strategy is proposed in this thesis to implement the secondary filtering on the unlabeled samples in the training process of the Tri-training.In experiment,the medical record data of coronary heart disease is used to construct the classification model.The experimental comparison and analysis show that the optimized Tri-training can better classify coronary heart disease.(3)Study and improve the classification model toward the medical image data by introducing the graph-based semi-supervised algorithm.The medical image is also a kind of unstructured data,which is classified by applying the anchor construction based semi-supervised learning algorithm.And in this thesis,we improve the algorithm by optimizing the selection of neighboring anchor for unlabeled samples with the distance-mean circle strategy.Through experiments,it shows that the optimized algorithm can more effectively classify the capsule endoscopic images.
Keywords/Search Tags:health data, classification, self-training, Tri-training, anchor
PDF Full Text Request
Related items