Font Size: a A A

Data Mining Techniques And Algorithms For Mining Association Rules

Posted on:2004-10-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:G J MaoFull Text:PDF
GTID:1118360092492028Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is a technique that aims to analyze and understand large source data and reveal knowledge hidden in the data. It has been viewed as an important evolution in information processing. Why there have been more attentions to it from researchers or businessmen is due to the wide availability of huge amounts of data and imminent needs for turning such data into valuable information. During the past decade or over, the concepts and techniques on data mining have been presented, and some of them have been discussed in higher levels for the last few years. Data mining involves an integration of techniques from database, artificial intelligence, machine learning, statistics, knowledge engineering, object-oriented method, information retrieval, high-performance computing and visualization. Essentially, data mining is high-level analysis technology and it has a strong purpose for business profiting. Unlike OLTP applications, data mining should provide in-depth data analysis and the supports for business decisions. Like the other new techniques, however, data mining must develop gradually from concept creation, accepted importance, wide discussion, few usage attempts to a large applications. Most experts consider it as the phase of wide discussion today. It still needs theoretic studies and algorithm exploring. Though some results have been achieved, more theoretic problems are kept in ongoing researches. In addition, data mining is from real applications and must combine with the specific business application logic to solve the specific problem. This is because that different business fields have different mining needs and targets. The successful data mining systems are the excellent combination of data mining techniques and the business logic, rather than tools that are designed to make data mining application development convenient.Association rule mining is an important branch of data mining that it has obtained many valuable results but there still are a deal of more challenging problems to discuss. For large databases, the research on improving the mining performance and precision is necessary, so many focuses of today on association rule mining are about new mining theories, algorithms and improvement to old methods.In this paper, the main researches involve the application architecture of data mining, the mining theories for association rules and the design of new efficient algorithms. This paper analyzed the basic processing phases of data mining or KDD, and gives the components of a data mining application system and their functions. In theoretic research, we first define Set of Item Sequences, and give some operators on this algebra lattice. Applying such theoretic results, we design an algorithm for mining association rules called ISS-DM, which is efficient with one pass to the database and without large candidates generated and stored. For mining large-scale databases, it is smart strategy to make use of constrains for improving data quality and reducing data capability. This paper introduces the problem of data mining based on temporal constrains. We create two new operators on temporal interval space and design an algorithm called TISS-DM by making advance of these operators. TISS-DM may be seen as an improvement algorithm to ISS-DM, which can process more scale databases. In fact, recent researches have paid more attention to reduce the number of passes over databases (I/O cost), memory usage and CPU overhead. This paper also gives an algorithm called PISS-DM which employs data partitioning technique and only has two passes over databases. Experimental results showed that these algorithms have higher mining efficiency in execution time, memory usage and CPU utilization than most current ones like Apriori.In conclusion, this paper analyzes applicationarchitecture of data mining systems, creates new mining theoretic models, and designs a series of new algorithms based on such theories.
Keywords/Search Tags:Data mining, KDD(Knowledge Discovery in Databases), Association rules, Set of itemsequences, Temporal constraint, Data partitioning
PDF Full Text Request
Related items