Font Size: a A A

Research On Association Rules Mining In Incomplete Information Systems With Mixed Data

Posted on:2009-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:L L WangFull Text:PDF
GTID:2178360242992785Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
People need to manage with more and more data with the rapid development of database technology and the wide use of database management systems. There is a lot of important information hidden in the increasing data. People want to analyze these data in order to find out the relationships and rules concealed in data. Thus, data mining was proposed. Data mining is one of the most forward lines of database and information decision area. Association rules mining is an important form of data mining to discover previously unknown, interesting relationships among attributes from large databases.Association rules mining was firstly developed in transaction databases where the problem of missing values does not practically exist. However missing values widely exist in daily databases. Rough set theory is a new mathematical approach to imprecision, uncertainty and incompleteness. It is more objective in describing and dealing with uncertainty than some other methods. There are strong complementarities of rough set theory and other data mining algorithms. In addition, mining association rules from data with both discrete and continuous attributes is an important problem. The common process is discretization of continuous attributes first, and then extraction of association rules.The main research of this paper is as follows:(1)It reviews the existing approaches to processing missing values and analyzes their advantages and disadvantages in incomplete information systems. And according to rough set's upper and lower approximation and boundary, it presents a new method to redefine the support and confidence of association rules in incomplete information systems. The new definitions can be used to mine rules with decision attributes directly without processing missing values. The results of experiments prove the correctness and validity of the algorithm.(2)A new algorithm of computing the candidate cut sets is proposed. This algorithm can maintain the system discernibility relation, and also produce candidate cut sets with much smaller cardinalities than the total of cuts. The theoretical analyses show that this algorithm reduces the time and space complexity of the following algorithms.(3)A novel two-layer immune genetic algorithm is put forward. This algorithm can mine association rules directly without discretization preprocessing. It overcomes the drawbacks that discretization preprocessing may make the original information system distortion. The experiments show that the new algorithm has nice computing performance and can mine effective association rules.(4)It establishes a model for association rules mining in incomplete information system with mixed data. It uses the algorithms put forward in this paper to mine rules directly without processing missing values and discretization preprocessing. Finally the functions of this model are realized partly.
Keywords/Search Tags:data mining, association rules, incomplete information systems, genetic algorithm, two-layer chromosomes, rough set
PDF Full Text Request
Related items