Font Size: a A A

Research On The Raining Prediction Based On Three Supervised Algorithms

Posted on:2021-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:S Y YangFull Text:PDF
GTID:2480306248455754Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Whether it will rain tomorrow is a topic of daily concern.In machine learning,we can regard this problem as a dichotomy problem and use some algorithms to predict.In this paper,support vector machine(SVM),Adaboost(Adaptive Boosting)andC5.0 of Decision Tree algorithm(Decision Tree)are adopted to build models based on eight years' weather data of Australia,giving prediction and analysis.It is inevitable to encounter missing values when obtaining a large amount of data.In this paper,in order to reduce the loss of data information,the k-nearest neighbor method is adopted to take the entire data as a parameterand the missing part is filled before the model is built.The second stepis to deal with the imbalance of samples.As there is a certain difference in the sample size between rainy and not rainy days,an undersampled balanced sample is used before modeling,while descriptive analysis of data is carried out on the original data set.In the part of model fitting,SVM,Adaboost and C5.0 are used to predict whether it will rain tomorrow.SVM first tries several kernel functions and then adjusts the kernel function parameters to fit model.Finally,the optimal partition hyperplane of a gaussian kernel is obtained.Adaboost firstly makes preliminary fitting,then carries out decision tree optimization,adjusting the number of the best trees and the tree depth to build a model again,so as to obtain a classifier with higher effect.Finally,C5.0 is used to improve the performance of the model by increasing the number of iterations on the basis of C4.5 and introducing a cost matrix to punish more serious misjudgment.In order to avoid random errors and overfitting caused by sample partition,the accuracy of each model is obtained by means of ten-fold cross validation..Performance indicators of the model,such as recall rate,precision rate,false positive case rate,f1-score and Kappa value,are compared and analyzed.Furthermore,the non-parametric test is used to further comparison of SVM,Adaboost and C5.0.Finally,the model that can predict most auccuately is the support vector machine with gaussian kernel function.
Keywords/Search Tags:KNN, SVM, Adaboost, C5.0, RainTomorrow
PDF Full Text Request
Related items