Font Size: a A A

Data Mining And Search Technology Application In The Tax System

Posted on:2005-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:C H WangFull Text:PDF
GTID:2168360152956752Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Today the application of Data Mining Technology plays more and more important role in people's daily life.Along with the development of the database and the widely use of the DBMS, the data stored in the database increases rapidly. There is much important information behind the data, people hope that they can extract the information in order to make use of it. But at present, most of the DBMS can only query the data stored in the database, and not discover the knowledge behind the data, this has been described as a data rich but information poor situation. How to discover and extract the useful knowledge from the abundant data is an important thesis. Knowledge Discovery in Database and Data Mining is the way to understand, discover and make use of the knowledge behind the data. Mining association rules is an important thesis of KDD research. Association rules put emphasis on confirming the relations between data and discovering the attributes depending on each other under given support and confidence. A major application area for mining association rules is the retail industry. Progress in bar-code technology has made it possible for retail organizations to collect and store huge amounts of data on sales. So Analyzing past transaction data, we can get customer buying behaviors, which is valuable knowledge for the store. Such information can help store layout, shelf placement, discover customer shopping patterns and trends, improve the quality of customer service, achieve better customer retention and satisfaction, enhance goods consumption ratios, design more effective goods transportation and distribution policies, and reduce the cost of business.For the sales database of supermarket, Agrawal has set up the model of association rule, and based on this model, proposed Apriori algorithm for data mining. Let I={i1,i2,…,im} be a set of items. Let D be a set of transactions, where each transaction T is a set of items such that T(I. We say that a transaction T contains X, a set of some items in I, if X(T. An association rule is an implication of the form X(Y, where X(I, Y(I, and X(Y=(. The problem of discovering all association rules can be decomposed into two subproblems, first is to find all large itemsets, second is to use the large itemsets to generate the desired rules. The first step is the core problem, the performance of mining association rules is decided by the first step. The Apriori algorithm is an influential and classical algorithm for mining large itemsets for association rules. Apriori employs an iterative approach to discover all large itemsets, in the first iteration, the algorithm simply counts the support of individual items and determine the large 1-itemsets. Then during each iteration, we use the large itemsets found in the previous iteration for generating new potentially large itemsets, called candidate itemsets, and count the support for these candidate itemsets to determine the large itemsets. The basic intuition of the algorithm is that any subset of a large itemset must be large. By using hash-based technique, partitioning, sampling, many algorithms have been proposed that focus on improving the efficiency of the original algorithm. However, in situations with prolific large itemsets, long patterns, or quite low minimum support thresholds, the algorithm may still suffer from the following two costs, one is that it must handle a huge number of candidate sets, the other is that it must repeatedly scan the database.From the research in the algorithms for mining association rules, we find that the general problems of these algorithms are that the number of the potentially large itemsets is huge and the scale of the data is not reduced in every pass. In this paper, we adopt multi-segments support algorithm. This algorithm presents the idea of counting the support by segment, that is, the support of a itemset is counted by segment, each segment is the frequency of the itemset presented in the transactions of corresponding size, and all subsections form a support vector. Using the multi...
Keywords/Search Tags:data mining, association rules, algorithm, multi-segments support, search, statistics
PDF Full Text Request
Related items