Application Of Classification Algorithm In Prediction Of Sandstorm In Inner Mongolia

Posted on:2019-11-01

Degree:Master

Type:Thesis

Country:China

Candidate:X Z Zhao

Full Text:PDF

GTID:2370330563997751

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the development of information technology and the coming of the era of big data,the rapid growth of global information data has become the foundation for the development of big data industries.IDC,a market research agency,predicts that the total global data will remain at a high level in the future,while China is an information industry country with data information from all walks of life.The meteorological department in China receives a lot of data every day.How to extract useful information from massive data and create value for us has become a key issue.Therefore,how to use meteorological data to establish efficient prediction models of sandstorm through data mining technology has become the focus of scholars in various countries.The research topic is the application of classification algorithm in Inner Mongolia meteorological data mining.In recent 50 years,the meteorological data of China's ground climate data daily data set and the Chinese strong sandstorm sequence and its support data set are screened out in Inner Mongolia area.This is the research object.First of all,in order to solve the problem of massive data storage and batch processing,Hadoop distributed platform and data warehouse platform hive were built.With HDFS as the underlying storage,the data can be preprocessed on the Hadoop platform by writing HQL statements to manipulate massive data.Then,according to the correlation between the missing rate and the attributes of the attribute value,the preprocessed data set is dimensionality reduced and the experimental data set is obtained.By analyzing the frequency of sandstorm occurrence,the experiment data set is adjusted by combining over sampling and down sampling method,and the next step classification model is established.This paper uses the widely used BP neural network algorithm,the SVM support vector machine algorithm and the naive Bayes algorithm to establish the classification prediction model.The prediction accuracy and scalability of each algorithm are analyzed and compared.Finally,in view of the simple Bias algorithm which is more suitable for large scale data sets,combining the attribute importance and the Adaboost algorithm framework,the traditional simple Bias algorithm is optimized from two aspects of attribute independence and classification decision.A weighted Adaboost-NBC classification method is proposed.Experiments show that the improved algorithm has a higher accuracy than the traditional single classifier.

Keywords/Search Tags:

data mining, meteorological data, sandstorm prediction, classification algorithm, Hadoop, Adaboost

PDF Full Text Request

Related items

1	Research On Meteorological Data Mining System Based On Hadoop/Hive And WebGIS
2	The Research Of Meteorological Data Mining Based On Hadoop
3	The Research Of Meteorological Data Mining Using Bayesian Classifier Based On Hadoop
4	Research On Multi-Classifier Model Based On Adaboost Algorithm And Application On Rainfall Prediction
5	Research On Classification Algorithm Of Meteorological Imbalanced Data
6	The Application Of Data Mining Technology In Meteorological Forecasting
7	Research On Meteorological Data Prediction Algorithm Based On Improved Bayesian Network
8	Research On Data Mining Algorithm Based On The Meteorological Data
9	Research On The Thunderstorm Data Clustering And Thunderstorm Prediction Model Based On The Hadoop Platform
10	Analysis And Application Of Meteorological Data In Zhangye