Font Size: a A A

A Machine Learning Model For Runoff Prediction Based On Feature Selection And Joint Time-Frequency Analysis

Posted on:2023-09-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q J WangFull Text:PDF
GTID:1520307022487174Subject:Hydraulic engineering
Abstract/Summary:
Accurate runoff prediction is an important basis for watershed management to implement flood control,drought relief and water resources scheduling decisions.The formation process of runoff is associated with many complex factors such as environmental climate and human activities,and the runoff data is highly nonlinear,stochastic and time-varying under the changing environment,and a large number of redundant features and noise increase the difficulty of runoff forecasting.In this context,it is both difficult and critical to construct high accuracy runoff prediction models.Machine learning is an important technique for modeling and analysis in the field of hydrology,and a lot of attempts have been made to improve the stability and accuracy of hydrological forecasting,such as data pre-processing and optimization of model parameters.Among them,feature selection determines the baseline data input to the machine learning model,and its goal is to filter the feature vector(feature subset)that yields the best modeling performance from the data feature space for later modeling;When the feature vector is certain,the key to improve the accuracy of runoff prediction is to fully explore the implicit information in the limited samples.The swarm intelligence algorithm has obvious advantages in machine learning model parameter optimization problems,and can find more representative implicit patterns in a short time.In addition,the time-frequency analysis technique can extract the implied laws(high-frequency disturbance,medium-frequency fluctuation and long-term trend)from the complex runoff series,and the"decomposition-reconstruction"modeling of each implied law can eliminate the superposition of various implied laws to a certain extent,thus reducing the difficulty of machine learning modeling.The above three improvements have improved the performance of hydrological data modeling,but there are also problems such as poor compatibility between feature vectors and prediction models,inconsistency in model parameter rate determination methods,and insufficient adaptability analysis of"decomposition-reconstruction"modeling,which hinder the further improvement of prediction accuracy.Therefore,the study is carried out on the monthly runoff data of Kizil Reservoir in terms of feature selection,model parameter rate determination,and"decomposition-reconstruction"modeling for time-frequency analysis.The main research and important conclusions were made:(1)In response to the problem that existing research on swarm intelligence algorithms optimize model parameters in an inconsistent manner and lack an effective research paradigm,a methodology for optimizing the parameters of machine learning prediction models is proposed.The methodology effectively integrates feature selection methods,swarm intelligence algorithms and machine learning models.First,a sliding window approach is used to restructure the time series into a supervised learning problem,and then a feature selection method is applied to determine the feature vector;next,sample partitioning,as well as swarm intelligence algorithm coding and fitness function design are performed according to the requirements for parameter optimization of various types of machine learning models;finally,based on the search mechanism of the swarm intelligence algorithm,the optimal parameters are determined and the corresponding prediction models are obtained.The empirical analysis reveals the practical role and effectiveness of feature selection methods,types of machine learning models,and swarm intelligence algorithms.The results show that the prediction performance of BP neural network and support vector machine(SVM)after using the proposed model rate-setting framework with the Gray Wolf algorithm(GWO),respectively,is better than the traditional model rate-setting approach.(2)Considering the high dimensionality and small samples of runoff data,as well as the complexity and high redundancy among features,the existing filtered and wrapped feature selection methods were disconnected from the predictor,and the embedded feature selection prediction model based on population intelligence(EFS-SVMSI)was proposed in this paper.In the machine learning prediction model rate determination framework,the EFS-SVMSI model replaces the original parameter encoding method with the feature-hyperparameter hybrid encoding method to achieve the adaptive simultaneous feature and parameter optimization.And then,the adaptability of three common types of algorithm improvement approaches(population initialization improvement,convergence factor nonlinear improvement,and multi-algorithm fusion improvement)were explored separately for the poor optimization capability of standard swarm intelligence algorithms.In addition,considering the characteristics of embedded models,the swarm intelligence algorithm improved by directed tuning strategy was proposed.The simulation results show that among the 21 algorithms used,the embedded model constructed by the directional tuning strategy improved gray wolf(DTGWO)has the best prediction performance.Finally,the differences between the modeling results of embedded prediction models and traditional feature selection methods are compared in terms of both prediction performance and feature selection results.In terms of prediction performance,the 57 single prediction models established in this paper were evaluated comprehensively with the help of the projection tracing comprehensive evaluation method,and the results showed that the embedded models generally outperformed the traditional models.In the feature selection results,the candidate features were classified as good excellent,good,medium or poor according to the frequency(embedded model)or relevance(traditional feature selection method)of feature selection.The embedded model was consistent with the traditional feature selection method in terms of"good"rating,while the rest of the feature ratings differ significantly.The embedded model takes into account the compatibility between feature subsets and predictor parameters to achieve a balance between feature relevance,redundancy and machine learning model parameters,and thus achieves better prediction results.(3)To address the controversies in modeling"decomposition-reconstruction"of time-frequency analysis techniques,this paper explores the reasons for the differences in prediction performance of various modeling frameworks from the perspective of data distribution characteristics.The results show that hindcast experiment framework is modeled with an overall decomposition approach that provides low complexity and representative training data for building models with high fitting performance and strong predictive performance.The modeling approach achieves better results,but the framework cannot be implemented in practice.When modeling using the predictive experimental framework,the parallel-step decomposition approach leads to significant covariate bias in the training and test sets,low complexity and poor representativeness of the training set,so that the model built has strong fitting performance and poor predictive performance.The simulation results of the two improved"decomposition-reconstruction"models show that the model built by the adaptive prediction framework(AFEF-VMDSVM)is much better than the single model in the abundant water period(measured flow value higher than 2×10~8 m~3),and worse than the single model in the dry water period.Therefore,based on the statistical analysis of the forecasting ability of the two types of models at each time of the year,the two models are integrated according to the principle of complementary strengths.The integrated model has the Nash efficiency coefficient NSE and the qualification rate QR(20%)of 0.97 and 81.03%,respectively,which are 17.98%and 21.4%higher than those of the single prediction model;this integrated modeling approach has solved the long-standing bottleneck problem of the single prediction model for flood prediction,and can provide an important scientific basis for the fine control of water resources in the Weigan River basin and the management of Kizil Reservoir.The integrated modeling approach solves the long-standing problem of single model flood prediction bottleneck,and can provide an important scientific basis for the refined control of water resources in the Weigan River basin and the management of Kizil Reservoir.
Keywords/Search Tags:support vector machine, feature selection, runoff prediction, swarm intelligence algorithm, machine learning, "decomposition-reconstruction" prediction model
Related items