Font Size: a A A

Research On Symbolic Regression Generalization Performance Based On Multi-objective Optimization

Posted on:2022-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:W T ShengFull Text:PDF
GTID:2518306512471844Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Symbolic regression,as an important method for data-driven modeling,can simultaneously discover the structure of explicit mathematical expressions and calculate parameters with high fitting accuracy.In data-driven modeling,the pursuit of high fitting accuracy often results in overfitting of the model,which usually manifests itself in increased model complexity and reduced predictive ability for unknown samples,i.e.,poor generalization performance of the model.Therefore,the study of balancing model accuracy and generalization ability is of great interest.According to Occam's razor rule,it is known that the simpler the model is,the more likely it is to approach the laws implied by the data,and the stronger the generalization ability of the model.Based on the rule,this thesis proposes an improved gene expression programming algorithm to improve the generalization ability of the model.This thesis addressing the problem that the existing symbolic regression methods only focus on the model fitting ability but ignore the generalization ability of the model.In this thesis,we analyze various existing model complexity measures,and select one method of machine learning to characterize the complexity of hypothesis space:Rademacher complexity,and introduce it into symbolic regression as the complexity measure of this algorithm.In this thesis,the training error and the model complexity were optimized as two objectives,and the Pareto solution set was fused with the model by ensemble learning method to obtain the final output model.In order to ensure the diversity of the population in the intelligent algorithm for solving the symbolic regression problem,an adaptive variation operator based on information entropy is designed in this thesis to replace the traditional variation operator.The operator acts on each bit of the individual code string in the population,calculates its information entropy based on the probability of occurrence of different symbols on each bit of the code string,and then determines the variation rate of each bit,and performs variation operations on the symbols of all individuals regarding each bit separately,so as to maintain the diversity of the population and enhance the local superiority-seeking ability of the algorithm.Finally,this thesis selects some training functions in classical literature and conducts experimental comparisons between this thesis's algorithm and several methods used respectively to verify the effectiveness of this thesis's method.
Keywords/Search Tags:Dymbolic regression, Structural risk minimization, Model complexity, Multi-objective optimizion, Adaptive variational operat
PDF Full Text Request
Related items