Font Size: a A A

The Research Of Meteorological Data Mining Based On The Hadoop Platform

Posted on:2017-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:J SunFull Text:PDF
GTID:2348330518995681Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of computer science and technology,amount of meteorological data in China is growing,and annual increase of meteorological data can reach to PB level,and the meteorological data type is relatively complex,it makes the traditional data storage and processing technology can't meet the demands.Now,how to efficiently find valuable information from these mass data becomes a hot issue in the Meteorological field.In recent years,the emergence of cloud computing technology and development provides a new opportunity for mass data storage and processing.It has significant advantages in the field of the huge amounts of data mining technology,and has been widely used.Cloud computing core idea is unified management and scheduling for computing resources,and forms a resource pool to provide on-demand services to users.Hadoop is a distributed framework,and it has high fault tolerance,high throughput,low cost and many other advantages.Transplanting the traditional data mining techniques to the Hadoop cloud platform not only can improve the efficiency of storage and conputing,but also reduce the cost of data mining,it has become the research hotspot in the field of meteorological data mining.This paper deeply studies the data mining method based on Hadoop platform and the characteristics of meteorological data.Considering the traditional Bayesian network classification data mining methods of some deficiencies,and combining with the Hadoop cloud platform to deal with the advantage of the huge amounts of data,this paper puts forward the improved algorithm based on Bayesian network classification of MapReduce.This paper mainly do the following research:(1)Based on the large-scale characteristics of meteorological data,this paper uses the Hadoop platform to deal with the meteorological original data set,and calculate the correlation coefficient between any two attributes.It uses the correlation analysis technology to select prediction attributes.To a certain extent,it reduces the complexity of the model training.(2)This paper analyzes the superiority and inferiority of some typical meteorological data mining classification algorithm.Based on the correlation characteristics of meteorological data,this paper chose the bayesian network classification algorithm.It is proposed to solve the uncertainty and relevance of things.(3)In bayesian classification model training process,This paper adopted the accuracy evaluation pattern.If the Classification model does not meet the accuracy requirement,it may need to continually modify parameters,and obtain the optimal classification model.The experimental results show that the improved algorithm is better than the existing algorithms in both computational efficiency and performance.
Keywords/Search Tags:Hadoop, Cloud Computing, Data Mining, Bayesian Net Classifier, Meteorological Data
PDF Full Text Request
Related items