Font Size: a A A

Research And Application On Machine Learning Methods For Health Assessment

Posted on:2017-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ZhaoFull Text:PDF
GTID:2348330485488177Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Classification is one of the main tasks of machine learning. Some decision-making problems in real life can be seen as classification problems, such as disease diagnosis that is closely related to people's health. The classification algorithm will train appropriate model from training samples, and give more intelligent results to help doctors to diagnose. However, for a certain disease, the direct use of a sort of classification algorithm to make decisions may not achieve the desired result, because of the performance difference of various classification algorithms on different data sets. That is to say, there is no classification algorithm that is better than other classification algorithms on any data set. In view of the fact that diagnosis has a very high demand for the accuracy, how to construct classification diagnosis models with strong generalization ability becomes the focus of machine learning research in this field.This paper mainly studies the classification diagnosis models with higher classification accuracy for certain diseases, thus providing a more accurate result for disease diagnosis. By comparing the classification accuracy of k-nearest neighbor(KNN), Logistic Regression and Support Vector Machine(SVM) for breast cancer and diabetes, we get more appropriate classification diagnosis model for breast cancer and diabetes. On this basis, in view of the influence of redundant features on the accuracy of classification, we propose a diagnosis model of breast cancer which integrates a hybrid feature selection and linear SVM, and further improve the accuracy of diagnosis of breast cancer; For the poor performance of the grid search for parameter optimization of Gaussian kernel SVM, we propose a diagnosis model of diabetes which integrates improved accelerated particle swarm optimization and Gaussian kernel SVM, and further improve the accuracy of diagnosis of diabetes. The main contribution and research results of this paper are as follows:(1) By comparing the classification accuracy of k-nearest neighbor(KNN), Logistic Regression and Support Vector Machine(SVM) for breast cancer and diabetes, we find that linear SVM achieved the highest accuracy for the diagnosis of breast cancer, and Gaussian kernel SVM achieved the highest accuracy for the diagnosis of diabetes. These are the basis for future research.(2) In view of the influence of redundant features on the accuracy of classification and training time, we propose a hybrid feature selection method which combines feature selection based on the correlation and sequential selection, and build a diagnosis model of breast cancer which integrates the hybrid feature selection and linear SVM, and further improve the accuracy of diagnosis of breast cancer.(3) In view of the influence of parameters on the performance of Gaussian kernel SVM, we improve the accelerated particle swarm optimization algorithm, and propose a diagnosis model of diabetes which integrates improved accelerated particle swarm optimization and Gaussian kernel SVM, and further improve the accuracy of diagnosis of diabetes.
Keywords/Search Tags:machine learning, classification, disease diagnosis, Support Vector Machine(SVM), feature selection
PDF Full Text Request
Related items