Font Size: a A A

The Research And Implementation Of Algorithm For Mining Association Rules Based On BigData

Posted on:2017-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:D J HeFull Text:PDF
GTID:2348330485484005Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the advent of the information era, while people benefiting from huge amounts of data, they also facing the challenges it brings at the same time. The most affected are the Internet industries. They have to collect and process large amounts of data every day, willing to get high-valued information from it. But the traditional association rule mining algorithms' s strength don't match their ambitions while dealing with BigData. Mining the association rules efficiently, sixpenny and real-timely data mining based on BigData has become the primary issue of data mining research. As the representative of the distribute computing platform, Hadoop has solved the problem of operation costs, and most of classic algorithm libraries on distribute computing platform are good enough. But due to birth defects of these algorithms, its mining efficiency on BigData is not good enough yet. Many related data-preprocessing technologies are lack of practical value. It's very necessary and urgent to improve or modify these existing classical algorithms. In the part of data-preprocessing of this research, we showed how the innovation technicals named Decision-Tree-With-Weights and Dimension-Extend-Theory work. The first one was used to select attributes from attribute-sets, and the second one is used to discrete warning logs of network and make affair tables. In the Association-Rule-Mining part, we used an improved algorithm to improve the efficiency of mining. This algorithm integrates the advantages of the two classic algorithm which named Apriori and FP-Growth. By using support-vector technology, we improved the efficiency of generating and selecting candidate-items. In order to get further improvement of the mining efficiency, we proposed a simple computing framework. And finally we achieved the goal of mining association rules precisely and efficiently by implementing the algorithm and computing framework on Hadoop, the results of our experiments also showed that our algorithm and computing framework work very well, the improvement of efficiency is obviously. As for the data-preprocessing technologies we proposed work perfect too and they are very practical.Firstly, we introduced the origin, importance, research status at home and abroad of Association-Rule-Mining technique, several important Association-Rule-Mining algorithms and operation platform are introduced at the same time. Then the very important part of data mining which means data preprocessing technology and methods are introduced in detail. Then the principle, advantages and disadvantages of the classical algorithms are analyzed in detail, the improved algorithm and our simple computing framework are followed. After that we introduced how we implement our simple computing framework on Hadoop. Three different algorithms were compared at the end, and the testing results were analyzed overall. The main contents of this paper are as follows:a. Collecting and sorting. To collect and collate the data needed by the research of this subject, we can ensure the basic data of this research through the extensive collection of various types of data, and make sure that all the algorithms will be tested on BigData sets.b. Designing data preprocessing technique and methods. We can make the data format unique and useful for us by modifying the existed data pre-processing techniques and methods or propose innovative methods. The study mainly focused on the pre-processing of the traffic data and network alarm data. The Decision-Tree algorithm is modified for feature extraction. The dimension-extension theory proposed by us is used to discrete network alarm data. Then the data is structured and format-uniformed.c. Designing improved-algorithm. We designed an efficient and wide usage algorithm by analysing the existed two association rule mining algorithms deeply. It's better than the classic algorithms.d. Designing the simple compute framework. We designed a simple compute framework which has the ability of resource management and task supervision and error correction by analysing the implementation mechanism of Hadoop. This simple compute framework helped our improved-algorithm to improve its accuracy and efficiency.e. Implementing and optimizating improved-algorithm and simple compute framework. By analyzing the working principle of our own designed simple framework and MapReduce framework, we implement the simple framework on the Hadoop. At the same time we adjusted the parameters of the cluster according to the testing results, and finally we got an optimized framework.Results:on small and medium data sets, the improved-algorithm didn't improve much, and even slightly worse than the other two algorithms. But on massive data sets, the efficiency of the improved-algorithm improved quite obviously, beyond the expectations.
Keywords/Search Tags:Big Data, data mining, association rules, distributed computation, Hadoop, MapReduce
PDF Full Text Request
Related items