Font Size: a A A

Application Of Classification Algorithm In Prediction Of Sandstorm In Inner Mongolia

Posted on:2019-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:X Z ZhaoFull Text:PDF
GTID:2370330563997751Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of information technology and the coming of the era of big data,the rapid growth of global information data has become the foundation for the development of big data industries.IDC,a market research agency,predicts that the total global data will remain at a high level in the future,while China is an information industry country with data information from all walks of life.The meteorological department in China receives a lot of data every day.How to extract useful information from massive data and create value for us has become a key issue.Therefore,how to use meteorological data to establish efficient prediction models of sandstorm through data mining technology has become the focus of scholars in various countries.The research topic is the application of classification algorithm in Inner Mongolia meteorological data mining.In recent 50 years,the meteorological data of China's ground climate data daily data set and the Chinese strong sandstorm sequence and its support data set are screened out in Inner Mongolia area.This is the research object.First of all,in order to solve the problem of massive data storage and batch processing,Hadoop distributed platform and data warehouse platform hive were built.With HDFS as the underlying storage,the data can be preprocessed on the Hadoop platform by writing HQL statements to manipulate massive data.Then,according to the correlation between the missing rate and the attributes of the attribute value,the preprocessed data set is dimensionality reduced and the experimental data set is obtained.By analyzing the frequency of sandstorm occurrence,the experiment data set is adjusted by combining over sampling and down sampling method,and the next step classification model is established.This paper uses the widely used BP neural network algorithm,the SVM support vector machine algorithm and the naive Bayes algorithm to establish the classification prediction model.The prediction accuracy and scalability of each algorithm are analyzed and compared.Finally,in view of the simple Bias algorithm which is more suitable for large scale data sets,combining the attribute importance and the Adaboost algorithm framework,the traditional simple Bias algorithm is optimized from two aspects of attribute independence and classification decision.A weighted Adaboost-NBC classification method is proposed.Experiments show that the improved algorithm has a higher accuracy than the traditional single classifier.
Keywords/Search Tags:data mining, meteorological data, sandstorm prediction, classification algorithm, Hadoop, Adaboost
PDF Full Text Request
Related items