Font Size: a A A

Research On Classification Algorithm

Posted on:2010-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2178360302465940Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of globalization of Information techniques, the automatic collection tools and database techniques make people get data much easier and the data in database is much larger than ever before, but the information people get grows slowly. Current database system can only access the database, while the information behind these data cannot be mined due to limitations of tools. So researching on data mining techniques has become an insistent demand in order to find information behind data so that it can help people make decision. On the other hand, since data mining tools can provide information or knowledge that is valuable for people make decision, more and more enterprises begin to make use of it to analyze data in their companies. Data mining is widely used in finance, insurance, retailing, scientific research, industry department, justice department, bio- pharmacy areas and has achieved great success. Deep research on data mining techniques will make great sense in promote our country into a great software country.As the society coming into the information period, and the comprehensive application of the computer network technology, the database in every industry period accumulates substantive data increasingly. But how to use these data and choose useful information and knowledge from databases to guide the production and distribution of the enterprises comes into being and develops a new method to pick up useful data—Data Mining Technology which is widely used and has tremendous practicality. With the instance that the data scale expand rapidly and the analysis requirement increase incessantly, many kinds of data analysis tools and data mining systems were studied and explored continually. There are a lot of successful data mining systems, such as SAS Enterprise Miner, SPSS Clementine, and IBM Intelligent Miner and so on. They have developed for more than ten years, and applied successfully in many business or scientific research areas. Not only guide the management and development of the corporations, bringing tremendous economy benefit, but also do a lot of contribution for the research of data mining by scientific research institutions.Mining tasks by categories: including classification or prediction knowledge model discovery, data summary, data clustering, association rules discovery, temporal pattern discovery, dependency or dependency model found that abnormal and trends in the discovery. By digging object classification: including relational databases, object-oriented database, spatial database, temporal databases, text databases, multimedia databases, heterogeneous databases, data warehousing, deductive databases, and Web databases. KDD is a human-computer interaction process. This process needs to undergo a number of steps, and a lot of decision-making needs provided by the user. From a macro point of view, KDD process is mainly through the three components, namely, data cleansing, data mining and interpretation of the results of evaluation. Data mining system is a bridge between the research and application of data mining, it do lots of effect for the spread of data mining.The artificial neural network classification method is a feasible method, Artificial neural network is formed by a simple calculation unit wide parallel interconnection networks can simulate the biological structure and function of the nervous system. Neural network composed of a single neuron of the structure is simple, limited functionality, however, by a large number of neurons in a network system can achieve powerful. Artificial neural network applied to data mining, intends to leverage its non-linear processing power and noise tolerance capabilities, get better data mining results. Artificial neural network applied to data mining the main obstacle is that by artificial neural network to learn the knowledge difficult to understand; learning time is too long, not suitable for large data sets.Classification is one of most important techniques is data mining. It can analyze large quantity of data and build classification model in domains through learning. This technique is widely used in science, engineering, finance and so on.In this paper, we first introduce some classical algorithms, and then we talk about classifiers based on Bayesian theory in detail, and compare some algorithms. Finally, we introduce a classifier implemented by ourselves-Tree Augmented na?ve Bayesian network Classifier (TANC). We compared our classifier with JieCheng'Belief Network PowerPredictor (BNPP) on the same data sets, experiment shows the correctness ratio of TANC is about the same with BNPP'correctness ratio.
Keywords/Search Tags:data mining, classifier, Bayesian theory
PDF Full Text Request
Related items