Font Size: a A A

Research Of Classification Algorithm Based On SVM And Its Application In Data Mining

Posted on:2008-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:X W ZhangFull Text:PDF
GTID:2178360212985186Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the more universal application of database technology, every walk of life has collected large volumes of raw data in which abundance of information merged. Consequently, how to abstract useful knowledge from it and thus direct the operation of business becomes a problem need to be solved imminently. Data mining technology comes into being in this background. Data mining is a non-trivial process searching for useful, potential and understandable form from sets of data, in which classification is one of the most widely used data mining tasks.As a new emerging classification algorithm based on statistical learning theory, support vector machine (SVM) is prominent by its solid theory foundation, smart algorithm implementation and excellent performance. Compared with other classification algorithms, SVM has the advantages of global optimization, simple structure and high generalization ability. So far, it has achieved the best performance in many fields. In consideration of this, application of SVM into data mining is studied in this thesis for the problems encountered in the participating into the data mining project for water supply company.The concepts, mining task and basic mining process of data mining is addressed at the beginning. Also, some commonly used classification algorithms and their advantages and disadvantages are compared and analyzed,as well as some method to evaluating classification model briefly introduced. Then, this thesis studies the SVM theory, detailly describing the statistic learning theory and the principle of SRM, deducing the SVM algorithm using the concept of separating hyperplane with maximal margin. besides, the superiority of SVM as a novel classification algorithm is also illuminated. On the foundation of this, the following solutions to applying the SVM algorithm into more general data mining tasks are discussed: training algorithms for large dataset; multiclassification algorithm builded on basic binary classification SVM; improved model parameter optimization algorithm based on Grid search which is combined with stratified random sampling.After the above theory analysis, this thesis proposes a modeling solution to predicting custmers with payment arrears for water fee based on SVM. Combined with data mining knowledge, data preprocessing procedure is detailed. Grid search for optimal parameter by the method of cross validation with stratified random sampling is taken and deeply studied.
Keywords/Search Tags:data mining, support vector machine, kernel, cross validation, stratified random sampling
PDF Full Text Request
Related items