Font Size: a A A

Research Of The Algorithms For Mining Association Rule Abstract

Posted on:2008-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhouFull Text:PDF
GTID:2178360218952559Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Data mining is a technique that aims to analyze and understand large source dataand reveal knowledge hidden in the data. It has been viewed as an importantevolution in information processing. Why there have been more attentions to it fromresearchers or businessmen is due to the wide availability of huge amounts of dataand imminent needs for turning such data into valuable information. During the pastdecade or over, the concepts and techniques on data mining have been presented, andsome of them have been discussed in higher levels for the last few years. Datamining involves an integration of techniques from database, artificial intelligence,machine learning, statistics, knowledge engineering, object-oriented method,information retrieval, high-performance computing and visualization. Essentially,data mining is high-level analysis technology and it has a strong purpose for businessprofiting. Unlike OLTP applications, data mining should provide in-depth dataanalysis and the supports for business decisions. Like the other new techniques,however, data mining must develop gradually from concept creation, acceptedimportance, wide discussion, few usage attempts to a large applications. Mostexperts consider it as the phase of wide discussion today. It still needs theoreticstudies and algorithm exploring. Though some results have been achieved, moretheoretic problems are kept in ongoing researches. In addition, data mining is fromreal applications and must combine with the specific business application logic tosolve the specific problem. This is because that different business fields havedifferent mining needs and targets. The successful data mining systems are theexcellent combination of data mining techniques and the business logic, rather thantools that are designed to make data mining application development convenient.Association rule mining is an important branch of data mining that it has obtainedmany valuable results but there still are a deal of more challenging problems todiscuss. For large databases, the research on improving the mining performance andprecision is necessary, so many focuses of today on association rule mining are aboutnew mining theories, algorithms and improvement to old methods. In this paper, the main researches involve the application architecture of datamining, the mining theories for association rules and the design of new efficientalgorithms. This paper analyzed the basic processing phases of data mining or KDD,and gives the components of a data mining application system and their functions. Intheoretic research, we first define Set of Item Sequences, and give some operators onthis algebra lattice. Applying such theoretic results, we design an algorithm formining association rules called ISS-DM, which is efficient with one pass to thedatabase and without large candidates generated and stored. For mining large-scaledatabases, it is smart strategy to make use of constrains for improving data qualityand reducing data capability. This paper introduces the problem of data miningbased on temporal constrains. We create two new operators on temporal intervalspace and design an algorithm called TISS-DM by making advance of theseoperators. TISS-DM may be seen as an improvement algorithm to ISS-DM, whichcan process more scale databases. In fact, recent researches have paid more attentionto reduce the number of passes over databases (I/O cost), memory usage and CPUoverhead. This paper also gives an algorithm called PISS-DM which employs datapartitioning technique and only has two passes over databases. Experimental resultsshowed that these algorithms have higher mining efficiency in execution time,memory usage and CPU utilization than most current ones like Apriori.In conclusion, this paper analyzes application architecture of data miningsystems, creates new mining theoretic models, and designs a series of newalgorithms based on such theories.
Keywords/Search Tags:Data mining, Association rules, Set of itemsequences, Temporal constraint, Data partitioning
PDF Full Text Request
Related items