Font Size: a A A

Application And Research On Association Rule Mining Algorithm In Large Data Sets

Posted on:2012-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y R WangFull Text:PDF
GTID:2178330332491545Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and database,humanity has entered into the information age,the ability of collecting and storing data increasing greatly,traditional data analysis tools cann't meet the requirements of the people.How not to drown by these huge amount of data,how to mine the useful information from the data to help people making a strategic decision.In this situation, the data mining technology come into being.Data mining is a process that discovers the potentially and useful information from data.Association rule mining is an important field of data mining, which is main used to find the connections between items in the database.Association rule mining has been applied to various aspects,such as: shelf layout ,inventory management of shopping center,marketing strategy,data analysising of bank,telecommunication,mobile communication and used in insurance,medical and so on.Facing the expanding massive data,the traditional association rule mining cann't meet people's demands,so association rule mining in large data sets is very important.Considering these issues ,in this paper,the problems of large data sets mining is solved in two ways,the research of association rule based on sampling and the discussion of the parallel association rule mining model.Sampling is method widely used in statistic.When the amount of the whole data is large, it's not actuality to research each individual,so sampling is used to get a small sample to estimate the whole data.On the basis of study exsisting sampling algorithm,combining the sampling concept and association rule,a new sub-sampling algorithm for two levels (EHAC) is proposed.This algorithm sample the data before the mining,to make sure the frequent k-items sets can be divided equally when the data divided equally.The experiments proves that the performance of EHAC is supermacy,the accuracy is better than HAC and mining time is reduced a lot .A association rule mining algorithm which based on client/server model is proposed after analysising the typical parallel algorithm.In this algorithm,a central node is setted as the server and the other nodes seemed as client,the original large data is divided equally to each client,each client mine independent.Each clent pass local frequent itemsets to the server after mining,the server gets the whole frequent itemsets,the communication between each client is avoided,so the communication is reduced in this way. Meanwhile, the trigger mechanism of database which can make the generation of the whole frequent automatically is introduced.The experiments shows that the model and algorithm improves mining performance, make mining large data sets from impossible to possible, from hard to easy.
Keywords/Search Tags:data mining, large data sets, association rules, sampling, parallel mining, client/server
PDF Full Text Request
Related items