Font Size: a A A

Research On The Apriori Algorithms For Meteorological Data Association Rules Analysis Based On Cloud Computing

Posted on:2016-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:J S HuangFull Text:PDF
GTID:2308330470969713Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the modern Internet technology, we face the problem of accessing, storing and studying the massive data. It has become important and critical to locate valuable information out of rapidly increasing data. Industry relating meteorology has massive eteorology data, which is of complex types, yet embodying meteorological laws. Association rules algorithm shows its strength in finding those meteorological laws by finding the connections and regular patterns among data and these patterns can be used in weather predicting effectively and thus preventive actions taken accordingly. Considering the quantity of meteorology data, traditional Association rules algorithm can no longer meet the demands and it becomes particularly significant to crack the bottleneck in efficiency, applicability and availability of Association rules algorithm Massive date processing in traditional computers takes a long time, which can be reduced sharply by Parallel Algorithms. Therefore, meteorology business depends a lot on cloud computing technology because its powerful computing ability offers technology support for meteorology data mining and an application of Massive data mining algorithms in cloud computers has very important practical significance.The concept of data mining emerged in the late 80s of the 20th century, which is an emerging interdisciplinary containing all research results from smart Intelligence, Machine Learning, Pattern Recognition, Statistics, Database, Visualization and many other fields. Association rules algorithm is one of the important branches of data mining. With advantages in specific targets, flexibility, convenience and extensive applicability, it has become one important data mining method in studying internal association of data. Nowadays, the rapid expansion of the information leads to the geometric pattern growth of data, therefore distributed correlation algorithm provides platform for data mining.The research improves the traditional Association rules algorithm Apriori with improved algorithm based on compressed matrix. Combining Hadoop open-source platform’s advantages in processing matrix, this research designs an improved Apriori algorithm based on MapReduce. This algorithm operates MapReduce in two parts, whose results combine to get frequent collection.In order to assess the performance of the algorithm, this research designed multiple experiments by altering the sizes, support of data sets as well as nodes of Hadoop data sets. From the results of the experiments we see that, comparing with traditional algorithm, improved Apriori has obvious increase in efficiency and complexity of massive data processing on the cloud computing environment. Besides, changes in support and nodes of Hadoop affect the efficiency of algorithm, thus proving the scalability of improved algorithm on the cloud computing environment.This research operated meteorology data mining with Association rules algorithm. Under experimental circumstances, some meaningful data were found, which revealed hidden information from meteorology materials. All the results from the researches attempts to serve as the foundation of further studying.
Keywords/Search Tags:data mining, Association rules, Apriori, Compressed matrix, Hadoop, meteorology data
PDF Full Text Request
Related items