Font Size: a A A

Research On Big Data Association Rules Mining Algorithm In Uncertain Environment

Posted on:2022-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:X XinFull Text:PDF
GTID:2518306608968949Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of industries such as new media and the Internet,generating more and more data.Mining data quickly and efficiently in big data environment is a current hot issue.Association rule is an important part of data mining,but traditional association rules mining mainly focuses on certain issure,ignoring the uncertainty in the data and cannot efficiently mine information from big datasets.Based on the distributed Spark framework,this paper proposes a big data association rule mining algorithm in an uncertain environment,so that it can efficiently discover effective knowledge in uncertain datasets and in a big data environment.First,analyzes the problem of uncertain association rules,including the characteristic of uncertain datasets,association rules model,expected support models and the expected correlation model in uncertain environments,uncertain prefix division strategy and so on.Secondly,based on the analysis of uncertain association rules,proposing two parallel algorithms for mining association rules.The Sp-IEB algorithm scans the database once,uses bitmap-based calculation methods and a prefix partition strategy to distribute frequent itemsets to different comput e nodes for parallel calculation.In the algorithm experiment,verifying the efficiency of the algorithm,CPU usage and I/O load at runtime.The Sp-UPPS algorithm can mine valuable information in uncertain datasets in uncertain environments,convert the uncertain horizontal datasets into vertical form to scan the datasets once.It uses an uncertain prefix division strategy to reduce search space,distribute itemsets to different compute nodes for calculation.In the algorithm experiment,verifying the efficiency of the algorithm,memory usage and I/O load at runtime.Then,in order to mine all association rules in uncertain datasets,designing a parallel algorithm for mining uncertain positive and negative association rules--ECMA.It uses the expected correlation to judge the correlation between itemsets,which solves the shortcomings of the positive and negative itemsets both satisfying the expected support model when calculating the positive and negative association rules.Using an uncertain prefix division strategy to distribute frequent itemsets to different compute nodes for calculation.In the experiment,verifying the comparison between the number of positive frequent itemse ts and negative frequent itemsets in the dataset.Finally,in order to verify the performance and effectiveness of the algorithm,applying the research results of this paper to the economizer data mining system to discovery the economizer fault.Designing data preprocessing,data mining and result analysis modules,using the Java language for development.The application results show that the algorithm in this paper can mine the attributes with strong correlation with economizer faults efficiently and accurately.
Keywords/Search Tags:big data, uncertain dataset, data mining, Spark framework
PDF Full Text Request
Related items