Research On Machine Learning Methods For Intelligent Decision-Making

Posted on:2013-01-06

Degree:Doctor

Type:Dissertation

Country:China

Candidate:H L Chen

Full Text:PDF

GTID:1118330371483014

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

At present, the classification problem has become the most important research topic in areas such as datamining, pattern recognition and machine learning, and also has found a very wide range of applications in thereal situation. In reality, a lot of intelligent decision-making problems can be categorized into the supervisedclassification problems. Machine learning methods by virtue of its automatic learning capability to learn fromthe data, and has strong ability to extract complex patterns to make intelligent decisions, which provide aneffective solution to solve these decision problems. However, due to the complexity of the decision problemitself, the direct application of traditional classification learning methods are often difficult to achieve thedesired decision-making, how to construct a classification model with strong generalization ability, andprovide scientific support for decision-maker, which has not yet been well solved up to now.This paper studies the construction of the supervised classification models with strong generalizationability, with applications to solving the intelligent decision problem such as disease diagnosis and financialrisk prediction. We focus on studying the supervised classification methods such as support vector machines(SVM) and fuzzy k-nearest neighbor (FKNN). We have done the systematic and in-depth study on theinherent defects of these methods, and carried out a series of studies by focusing on how to build classificationmodels with strong generalization ability. We have proposed the rough set (RS) based SVM prediction modelfor breast cancer diagnostic decision support, the local Fisher discriminant analysis based SVM model forhepatitis diagnostic decision support, three-stage SVM based hybrid model and fuzzy k-neighbor (FKNN)based model for thyroid disease diagnosis decision support, swarm intelligence based SVM model for diseasediagnosis and financial risk assessment, parallel meta-heuristic based FKNN model for bankruptcy prediction.The main contribution of this paper and research results are as follows:(1) We have done a comprehensive overview on the state of the art of the decision support problems such asmedical diagnosis and financial risk prediction, made an analysis and discussion of the currentdevelopment trends and problems faced in these areas. In addition, we have made a brief survey of theresearch development of the machine learning. We selectively analyzed and discussed the currentsituation and existing problems of some specific machine learning methods such as SVM and FKNN.The discussion and analysis of these topics have laid a foundation for future research workï¼›(2) The performance of SVM was affected by the irrelevant or redundant features appeared in data, in orderto solve such problem, we propose to use the feature reduction methods before training the SVM model.â‘ The combination of RS feature selection and SVM model for breast cancer diagnosis. RS based featureselection was used to conduct the data reduction, obtain the core features, thus greatly simplifies thefollow-up training process for SVM, and further improve the generalization ability of SVM in thedecision-making on the issue of the cancer diagnosis. Experimental results show that the model can notonly achieve the high prediction accuracy rate, but also successfully detected five important featureswhich are closely related to the breast cancer;â‘¡The combination of local Fisher discriminant analysismethod (LFDA) and SVM for hepatitis diagnosis. LFDA makes the original feature space of hepatitisdisease data project to a more dissociative low-dimensional space, which simplifies the training speed ofSVM, while improving the generalization ability of SVM in the decision-making on the issue of hepatitisdiagnosis. The experimental results show that the prediction model not only has a good featurediscriminative ability, but also obtains excellent diagnostic accuracy far higher than existing diagnostic methods;(3) The performance of SVM are affected by its hyper parameters and feature selection simultaneously, wepropose a hybrid model which integrates Fisher score (FS) and particle swarm optimization (PSO)algorithm. In the first stage, FS was used to score the importance of each feature, and then we got avariety of feature subset composed of the most distinguishing ability; in the second phase, SVM wastrained on the various feature subset, PSO algorithm was taken to optimize the hyper-parameters; in thethird stage, the optimal SVM model was used to classify the data samples of unknown types of thyroiddisease. The experimental results show that the hybrid model can effectively filter out the most importantpathogenic features of thyroid disease, so it can become a great aid to the doctor for disease analysis anddiagnosis;(4) The performance of the FKNN method was seriously affected by the neighborhood size and fuzzystrength coefficient. In order to solve such problem, we propose an adaptive FKNN model based on thePSO algorithm, which is utilized to dynamically adjust the two parameters of the FKNN model. Themodel is used for thyroid disease diagnosis, experimental results show that the proposed model caneffectively diagnose the thyroid disease condition, the diagnostic accuracy is significantly better thanexisting models, and is more stable than the SVM model as well;(5) Parameter optimization and feature selection for SVM were performed separately traditionally; Inaddition, among the parameter optimization methods based on intelligent optimization algorithms, thenumber of support vectors were always neglected to be considered in the fitness function, which maycause all the training data be involved in the training process. To solve the above problem, we propose anadaptive support vector machine framework based on swarm intelligence, where parameter optimizationand feature selection were conducted simultaneously; In addition, we took into account the averageaccuracy rate, the number of support vectors and the number of features in designing a linear-weightedmulti-objective fitness function. We applied the proposed model to the decision-making issues such asdisease diagnosis and credit risk prediction, the experimental results show that the model can achieve agood accuracy;(6) At present, SVM and neural network (ANN) have the best performance among the bankruptcy predictionmodels, but their common main drawback is the black box in nature, and the model is relatively complex.Accordingly, we introduce the FKNN model to predict the bankruptcy for corporate or firms, andpropose a parallel heuristic strategies which can perform the parameter optimization and feature selectionfor FKNN simultaneously, consequently we obtain an effective bankruptcy prediction model. Theexperimental results show that the model can not only perform significantly better than SVM and ANNmodels in terms of performance, but also finds the most discriminative features.

Keywords/Search Tags:

Data classification, Support vector machines, Fuzzy k-nearest neighbor, Feature selection, Medicaldiagnosis, Firm bankruptcy prediction

PDF Full Text Request

Related items

1	Random K-Nearest Neighbor Algorithm With Application To Bankruptcy Prediction
2	Studies And Application Of Fuzzy And Double Regular Support Vector Machines
3	Research On The Generalization And Applications Of Suport Vector Machines
4	Research On Incremental Learning Algorithm Of Support Vector Machine
5	Text Sentiment Analysis Based On Text Classification
6	Research On Anomaly Detection Methods For Financial Data
7	Application Of Support Vector Machine And Fuzzy Theory For Remote Sensing Image Classification
8	Research On Technique Of Faults Classification With Support Vector Machines For Analog Electronic Circuits
9	Automatic Classification Research On Chinese Web Document Orientation
10	Research Of Nearest Neighbor Classification Algorithm Based On Sample Selection