Font Size: a A A

Studies Of Several Mathematical Models And Algorithms In Data Mining

Posted on:2006-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y TengFull Text:PDF
GTID:2168360152985576Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
In recent years, with the development of computer and informational technology, people need expend rich price to collect, store and process magnanimity of data. It is an imminent problem to solve how to extract distillate from draff and find useful information. Data mining technology comes into being in this background. The definition of KDD is this: the search for useful, potential and understandable non-trivial process patterns from sets of data. It involve a lot of intercross subjects and technologies such as machine learning, mathematical programming, statistics, neural network, database, pattern recognition, rough set, fuzzy mathematics and so on.Mathematical programming is an important branch of operational research. It have an important extensive application to areas of machine learning, networks problem, game theory, economics, mechanics and it's the most advanced area of operational research. Nowadays, mathematical programming makes great progress, conjure with other subjects to produce new study areas and find new application in new areas. The conjuration of mathematical programming and data mining makes it possible to solve large-scale and complicated problems. Mathematical programming is important in feature selection, clustering and regression, and these are the problems, which are solved imminently.This paper's main works is that: learning algorithm studies of support vector machine, mathematical model and application about feature selection, convergence analysis of clustering algorithm.Support vector machine is an important example that mathematical programming applies in data mining. Support vector machine is a new machine learning method, which is brought out according to statistic learning by Vapnik. The essence of support vector machine is quadratic programming. It is basic problem how to solve quadratic programming accurately and quickly, and these problems have close connection with optimal theory in mathematical programming. Here, author studies on-line learning of support vector machine and proximal support vector machine, then applies support vector machine to protein secondary structure prediction, and result is very well.Feature selection is this: while knowing superfluous feature and wanting to delete them, we would distinguish two data sets. Existing models need too much time and space to process high-dimension data (for example, data in brain science, which will be thousands of dimensions). This paper makes a progress in feature selection via support vector machine. Atlast, it is applied to the Disputed Federalist Papers. Although feature vectors, which are refined by different means, aren't same, we draw the identical conclusions.Cluster analysis is also the method, which is frequently used in data mining. It is an method of unsupervised learning. The paper gives convergence analysis of k-means algorithm, which presents credible theory guarantee to use algorithm.The sense of the paper lies in: make progress some mathematical model and algorithm in data mining, then make the methods fit to real data; try to apply the methods to new areas, so that enlarge their application range; present convergence analysis of algorithm, which presents credible theory guarantee for algorithm.
Keywords/Search Tags:data mining, mathematical programming, support vector machine, on-line learning, incremental learning, feature selection, cluster analysis
PDF Full Text Request
Related items