Font Size: a A A

SVM-Based Data Mining Technology Research

Posted on:2010-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:S H LiuFull Text:PDF
GTID:2178360275999356Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology all over the world, the amount of information also has improved with the geometric exponential growth, how to utilize the data mining technology to obtain the useful information from the massive and complex data has become the focus in the field of the research on the information technology in recent years. As a new technology Data Mining involves the characteristics of many disciplines, such as database, integrating artificial intelligence, mathematical statistics and so on. Among the aspects of Data Mining, Classification Mining is the most important and the most common data mining task.As a new general learning machine, Support Vector Machine is developed on the basis of statistical learning theory. The development of Support Vector Machine has become the research focus of data classification mining techniques because of its excellent theory (including VC dimension, Structural risk minimization and kernel space theory). In order to solve a complicated classification task, the vectors are mapped from input space to feature space in which a linear separating hyperplane is structured. As a structure risk minimized implement, Vector Machine has the advantages of global optimization, simple structure and high practicability.The article firstly introduce the basic concepts and technology of data mining, discusses and analyzes on the basis of the theory, basic concepts and basic algorithm of Support Vector Machine, as well as all kinds of algorithms on the support vector machine at present.During the applications, data mining technology is often dealt with large amounts of data or incremental data. The article is focus on the fast learning methods of the support vector machine and the incremental learning strategies while dealing with large-scale data. Since the training speed of learning algorithms of the general SVM can be increased, since the processing of historical data is too rough and cannot adapt to the different circumstances that vary with the data distribution of new samples concentration which leads to the injured accuracy, and since it is difficult to choose the penalty Component C of learning algorithms of the general SVM optimization problem C-SVM and it is difficult to adapt to the different circumstances that vary with the data distribution of new samples concentration, we propose a improved Bv-SVM incremental learning method based on self-tuning parameters of the two pre-check border vector. The UCI database experiments show that the method is significantly better than the learning algorithms of the standard C-SVM and simple incremental support vector machine in the training time, which is also better than the learning algorithm of a simple incremental support vector machine on the training accuracy. If the training data is much less, the training accuracy is similar with standard C-SVM, but as the increase of training data, training accuracy is beyond the standard C-SVM gradually, which shows that the method is very suitable for incremental learning of large-scale data sets.
Keywords/Search Tags:Machine Learning, Statistical Learning Theory, Data Mining, Support Vector Machine, Incremental Learning
PDF Full Text Request
Related items