Font Size: a A A

Research And Application Of The Discretization Of Real Value Attributes

Posted on:2009-08-16Degree:MasterType:Thesis
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:2178360275961338Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The arrival of the information age will undoubtedly have a profound influence on our lives. It brings extensive quantity of data in which many important information and knowledge hide. How to get deep rules from data is the urgent problem which needs to be worked out. Rules represent the hidden nature of things, and can be used to predict or make decisions. Data mining is the new research field against such background, is the cross subjects of many areas, such as statistics, computer science, pattern recognition,artificial intelligence, machine learning, data base.Discretization is an effective technique to deal with continuous attributes for machine learning and data mining. This article engaged in a serious study of the Discretization.At first, the discretization of real value attributes is discussed. Whether a discretization process is reasonable determines the accuracy of expression and extraction for information. I made full use of the Bayseian model which allows for the wrong classification in nature and improved the Chi2 algorithm. The improved algorithm is not only more suitable for inconsistency and incomplete data, but also make the interval merging more reasonable.Second, by adjusting the sequence of disretization for attributes according to the level of attribute significance, we propose a new algorithm called attribute significance -Chi2 algorithm which is based on attribute significance and exactly discretes the real value attributes.At last, from the perspective of application, I establish a individual housing loans of credit risk assessment model based on the decision tree of data mining technology, and the model has a high accuracy rate and can meet the needs of practical application.
Keywords/Search Tags:data mining, discretization of real value attributes, Chi2 algorithm, Bayseian, Selection of training set according to class proportion, Credit model
PDF Full Text Request
Related items