Font Size: a A A

Forecast The Precipitation Levels On Hadoop Based On Improved DAG-SVM Algorithm

Posted on:2017-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z W ChenFull Text:PDF
GTID:2180330485998910Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology, Meteorological industry in China is moving in the direction of informatization. Especially in recent years, with vigorously promotion of cloud computing, it not only provides a more efficient solution for meteorological industry processing mass meteorological data, but also provides better prediction schemes for meteorological industry forecasting disasters. At present, however, the methods of forecasting precipitation that we use has defects at some level, it requires the independent between attributes. But many meteorological factors are not independent, which reduces the accuracy of prediction.The emergence and rapid development of cloud computing provides efficient and reliable technical support for the storage and analysis of massive meteorological data. In this paper, according to the specific requirements of weather forecast and disaster forewarning methods on the rainfall forecast, we mainly do the following work:Firstly for the accuracy of rainfall prediction model, an improvement of traditional directed acyclic graph algorithm of support vector machine (DAG-SVM) be provided in the paper, we named it preprocessing DAG-SVM (PDAG-SVM). Structure fixed, individual two-class classifiers arranged randomly, these facts always bring in error accumulation between different layers in the DAG In this article, the accuracy of each two-class classifier is pre-tested and be prepared in a sequence according to difference accuracy. The two-class classifier with highest accuracy will be took as root node in DAG, then classifier with secondly priority listed into next layer. Following to this rule, every two-class classifier will be arranged into DAG by referring to the accuracy sequence.Finally, we will get a more accurate and efficient model that prevent error accumulation between different layers.Secondly for the efficiency of rainfall prediction model, we introduce Hadoop as our prediction platform and take an improvement of Hadoop’s default scheduling algorithm. As we have massive rainfall data, the ability of single machine to process and storage these data is not able to meeting the requirement of forecast efficient. While Hadoop platform is suit for this kind of task due to its parallel processing mechanism and storage space could be extended conveniently by increasing the number of single machines. However, Hadoop’s default scheduling algorithm Fair Scheduler when performing the task scheduler does not consider load balancing status of each node cluster system, leading to low efficiency great job. For defect fair share scheduling algorithm, and combined LATE scheduling algorithm, load balancing scheduling algorithm based on a fair share. Experimental results show that the fair share scheduling algorithm can be improved in the scheduling task when taking into account the situation of each node load balancing, improve the efficiency of large jobs.By improving both of DAG-SVM algorithm and Hadoop cloud platform, we built a forecast model to process rainfall prediction. In the article, we take the meteorological data from 1951 to August 2006 of Nanjing station is as research data, these meteorological data are divided into a training set and the prediction set. The data from 1951 to 2005 are took as training set, and the data from January to August in 2006 are took as prediction set. Then, we propose a classification method of meteorological data based on new built model. During the experiment, we classification of meteorological data according to the amount of precipitation, and then, Preprocess the meteorological data. The experiment proves the model achieve a satisfactory results no matter in accuracy and efficient.
Keywords/Search Tags:Hadoop Cloud platform, Precipitation forecast, PDAG-SVM, Load balancing, Fair Scheduler
PDF Full Text Request
Related items