Font Size: a A A

Study On Performance Improvement Of Classifier Based On Data Selection Method

Posted on:2021-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:Q C RenFull Text:PDF
GTID:2428330611468145Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Machine learning,as an important tool in data mining,not only explores the cognitive learning process of human beings,but also includes the analysis and processing of data.Faced with the challenge of massive data,at present,some researches focus on the improvement and development of machine learning algorithm,while others focus on the selection of sample data and the reduction of data set.The two aspects of researches work in parallel.The selection of training sample data is a research hotspot of machine learning.By effectively selecting sample data,extracting more informative samples,eliminating redundant samples and noise data,thus improving the quality of training samples and obtaining better learning performance.In this paper,the training samples of classifier are taken as the research object,and the selection method of training samples of classifier is studied.(1)The author reviews the existing sample data selection methods,and reviews the current methods from the three categories of sampling-based methods,cluster-based methods,and nearest-neighbor classification rules,and other related data-selection methods.Summarize and analyze and compare,and put forward some conclusions and prospects for the problems and future research directions of the training sample data selection method.(2)In order to improve the performance of the neural network classifier,this paper proposes a new training sample data selection method based on k-means clustering segmentation sample data selection method,using the method combined with artificial data set and UCI standard data sets of BP and LVQ,ENN respectively(the Extension neural network)of the three common classifier experiment research,and verify the effectiveness of the scheme.Through comparison experiments,it can be seen that under the premise of an average compression ratio of 66.93%,the performance of the three neural network classifiers has improved in most of their training steps and test set classification accuracy,indicating that the proposed training The sample data selection method can filter out the excellent samples in the training set and remove a large number of redundant samples to ensure the quality of the training samples.The selected training set for training can improve the performance of the neural network classifier.(3)Combined with the K-means clustering center obtained in the first step of the clustering-based segmented sample data selection method proposed in Chapter 4 to determine the initial class center of the ENN network,the KENN network is proposed,and combined with the selection of data sets through manual data sets and Iris The data set and actual engineering application data prove that it can further improve the performance of ENN,and provide a reference solution for improving the performance of ENN.Compared with the traditional ENN,the proposed KENN combined data selection method has shorter learning time,higher classification accuracy,better learning ability and stronger generalization ability.Effectively improve the comprehensive performance of traditional ENN.
Keywords/Search Tags:Training samples, K-means clustering, Neural networks, Classifiers, Data selection
PDF Full Text Request
Related items