Data mining techniques for frequent itemsets: Construction and analysis

Posted on:2004-08-02

Degree:Ph.D

Type:Thesis

University:State University of New York at Albany

Candidate:Ramesh, Ganesh

Full Text:PDF

GTID:2468390011476111

Subject:Computer Science

Abstract/Summary:

Data mining or Knowledge Discovery in Databases (KDD), has emerged as one of the most promising areas for database research over the past decade. This thesis combines constructive and analytic approaches to address three major issues: access, feasibility and scalability, that directly impact data mining research and applies these solutions to the classic problem of frequent itemset mining.; Databases are stored in various formats which permit different data access methods that impact the efficiency of the mining task. One important issue in mining is to bridge the gap between mining techniques and database management systems. The first part of this thesis evaluates indexing and data access methods for frequent itemset mining. We systematically compare representative mining approaches using various database formats and analyze their impact on the mining method's performance and storage overhead.; The performance of itemset mining methods is data dependent and is sensitive to the length distribution of the mined patterns. Due to the variation in itemset distributions between real and synthetic datasets, many methods which report good performance on synthetic datasets, perform poorly on real world datasets. In addition, current synthetic datasets are limited in their ability to represent real world itemset distributions. In the second part of this thesis, we characterize feasible distributions of frequent and maximal frequent itemset collections by providing tight bounds. In addition, we also present a constructive technique for synthetic database generation.; One common approach to mining massive databases is to tradeoff accuracy for efficiency through random sampling. The third part, of this thesis presents a novel general purpose sampling framework for empirical evaluation of popular sampling techniques and defines a new general purpose weighted accuracy measure which can be tuned to application specific requirements. A systematic experimental study is presented to evaluate the impact of various control parameters on accuracy. In summary, constructive and analytic methods help in guiding algorithm developers and mining practitioners in decision making.

Keywords/Search Tags:

Mining, Data, Frequent itemset, Methods, Techniques

Related items

1	The Research And Application Of Association Rules Mining Algorithms Based On Directed Itemset Graph
2	Data-Mining Methods Study And Its Application In Tranditional Chinese Prescription Compatibility Analysis
3	The Research On IDS Based On Mining Max Frequent Itemset Using Big Step Pruning Strategy
4	Research On Mining Frequent Itemsets Algorithm Based On Bittable
5	Multi-Relational Frequent Pattern Mining Algorithm And Its Application Research
6	Research On Frequent Itemset Mining Algorithm And Its Parallelization Based On Spark
7	Research On Novel Methods In Utility Pattern Mining
8	Research On Frequent Itemsets Mining Algorithm In Data Stream
9	Research On Frequent Itemset Mining Based On Differentially Private Model
10	An Algorithm For Mining Frequent Itemsets From Data Streams