Application And Research On Association Rule Mining Algorithm In Large Data Sets

Posted on:2012-12-09

Degree:Master

Type:Thesis

Country:China

Candidate:Y R Wang

Full Text:PDF

GTID:2178330332491545

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer and database,humanity has entered into the information age,the ability of collecting and storing data increasing greatly,traditional data analysis tools cann't meet the requirements of the people.How not to drown by these huge amount of data,how to mine the useful information from the data to help people making a strategic decision.In this situation, the data mining technology come into being.Data mining is a process that discovers the potentially and useful information from data.Association rule mining is an important field of data mining, which is main used to find the connections between items in the database.Association rule mining has been applied to various aspects,such as: shelf layout ,inventory management of shopping center,marketing strategy,data analysising of bank,telecommunication,mobile communication and used in insurance,medical and so on.Facing the expanding massive data,the traditional association rule mining cann't meet people's demands,so association rule mining in large data sets is very important.Considering these issues ,in this paper,the problems of large data sets mining is solved in two ways,the research of association rule based on sampling and the discussion of the parallel association rule mining model.Sampling is method widely used in statistic.When the amount of the whole data is large, it's not actuality to research each individual,so sampling is used to get a small sample to estimate the whole data.On the basis of study exsisting sampling algorithm,combining the sampling concept and association rule,a new sub-sampling algorithm for two levels (EHAC) is proposed.This algorithm sample the data before the mining,to make sure the frequent k-items sets can be divided equally when the data divided equally.The experiments proves that the performance of EHAC is supermacy,the accuracy is better than HAC and mining time is reduced a lot .A association rule mining algorithm which based on client/server model is proposed after analysising the typical parallel algorithm.In this algorithm,a central node is setted as the server and the other nodes seemed as client,the original large data is divided equally to each client,each client mine independent.Each clent pass local frequent itemsets to the server after mining,the server gets the whole frequent itemsets,the communication between each client is avoided,so the communication is reduced in this way. Meanwhile, the trigger mechanism of database which can make the generation of the whole frequent automatically is introduced.The experiments shows that the model and algorithm improves mining performance, make mining large data sets from impossible to possible, from hard to easy.

Keywords/Search Tags:

data mining, large data sets, association rules, sampling, parallel mining, client/server

PDF Full Text Request

Related items

1	Study On Parallel For Association Rules Mining
2	Research Of Paralleled Frequent Subgragh Mining Algorithm PG-Miner Based On Claster Environment
3	Large-scale Databases Association Rule Mining Algorithm
4	Association Rules In Data Mining Research And Of Teaching Quality Assessment
5	Design Of Frequent Pattern Mining Algorithm LPS-Miner And Research On Parallel Formulations
6	Application Of Association Rules Mining In Data Processing Of Borrowed Books
7	Data Mining Association Rules In The Research And Application
8	Research On Association Rules Mining In Data Streams And Its Application
9	Find Association Rules In Data Mining
10	The Research And Application Of Data Mining In Mining Rules Of Medical Diagnosis