Font Size: a A A

Studies On Several Problems Of Data Minging

Posted on:2002-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y J SuFull Text:PDF
GTID:2168360032957210Subject:Basic mathematics
Abstract/Summary:PDF Full Text Request
With widespread applications of database and computer network, data accumulated in the business, government and science have grown explosively. These organizations are more dependent on analysis and process of the databases in order to make competitive decision. However, existing data analysis tools are difficult to process the data well. The gap between the rapid increase of data and the sluggish of data analysis tools has become wider and wider, which has made an urgent need for new techniques and tools that can intelligently and automatically extract useful knowledge from the data. As a result, a new research area, that is Data Mining, is rapidly formed and being developed. Now more and more researchers have thrown themselves into the area. In recent years, some giant companies such as IBM, MICROSOFT have provided more funds to support the research of data mining techniques and tools. So do many countries to develop data mining systems and tools suitable for their countries. Now data mining has become a new international forward research field.What is data mining is first discussed, including the emergence background and definition of data mining. Then some important subjects of data mining at home and abroad are introduced, such as association rules, data generalization, data classification, data clustering etc. Finally, some challenges in the research and application of data mining are discussed, which contribute to the advanced development of data mining.Mining association rules has been an active research area of data mining. Mining association rules can usually decompose two steps: (l)generate all itemsets whose support are at least bigger than a given minimum support, which are referred to frequent itemsets; (2)extract all rules from the frequent itemsets. But the most important step is the frequent itemsets generation. A classical idea of frequent itemsets generation is introduced and a new idea is presented.Most of the previous studies adopt an Apriori-like heuristic, that is, any subset of frequent itemset is frequent itemsets. But it costs much to generate candidate itemsets in such a way, especially in situation with long frequent itemsets or quite low minimum support thresholds. Shown by analysis, the bottleneck of the algorithm Apriori is candidate itemsets generation and test. If one can avoid generating a huge candidates itemsets, the mining performance can be improved greatly. A new data structure Frequent Tree is constructed, which stores crucial information of frequent item. And an algorithm of mining frequent itemsets is presented based on the Frequent Tree, which can avoid repeated databases scans and a huge candidate itemsets generation, and can dramatically reduce the search space. Experimental results demonstrate that the approach is more efficient.Most of the existing work.has focused on mining positive association rules. In fact, it is equally important to mine negative association rules. The negative relation plays the same role with the positive relation from the view of the mathematics and formal logic. To fill theIIIcompleteness of data relation, we need negative association rules, just as it needs negative real number in real number system, negative proposition in logic system. Furthermore, one of the important problems in association rules mining is how to measure the uncertainty of the association rules. One of the most popular models for mining association rules is support-confidence model, which uses two values: supp(X~A') and conf(X-*Y) as the measurement of uncertainty of association rules. However, it is possible to extract association rule such as X-*Y, but X and Y are independeüt. This means that conf(X-*Y) is insufficient for measuring association rules of interest. An algorithm mining positive and negative association rules is presented based on the probability theory and the Piatetsky-Shapiro's argument and a model of mining association rules is constructed to measure the uncertainty of association rules, which is proved to be e...
Keywords/Search Tags:Data Mining, Knowledge Discovery in Database, Association Rules, NegativeAssociation Rules, Frequent Itemset, Negative Frequent Itemset
PDF Full Text Request
Related items