Font Size: a A A

Research On Restoration Method Of Geotechnical Engineering Missing Data Based On Machine Learning

Posted on:2024-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2542306929974089Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
There is a widespread phenomenon of data loss in real life,such as missing values in the process of collecting,transmitting,and storing data obtained by tunnel health monitoring systems,which greatly affects the evaluation results of tunnel health status.Therefore,the imputation of missing data is an important work in the field of data preprocessing,and it is also a hot topic that people have paid attention to in recent years.KNN algorithm is the most widely used imputation method in recent years due to its simple principle and significant effects.In the process of missing value imputation,this algorithm needs to re find the corresponding nearest neighbor for each missing data,so when the data dimension is too high,this method consumes a large amount of computational cost.Moreover,the problem of determining the K value of the number of nearest neighbors has always been controversial.For the above problems,this thesis proposes a new hybrid imputation method that introduces genetic algorithm and SVM model,achieving better results compared to traditional methods.At the same time,aiming at the missing problem of UCI data and tunnel health monitoring data,this thesis also studys the repair effects of various machine learning imputation models on missing values.The main work contents are as follows:(1)In order to verify the parameter optimization performance of genetic algorithms,this thesis selects three UCI datasets and four machine learning classification models,namely,k-nearest neighbor,decision tree,random forest,and support vector machine.Genetic algorithms,traditional cross validation algorithms,and grid search algorithms are used to optimize the parameters of the datasets,and the final classification accuracy are obtained.Experimental results show that genetic algorithm has better performance than traditional parameter optimization algorithms.(2)Based on the theory that genetic algorithms can simultaneously optimize parameters and select features,this thesis proposes a new hybrid imputation method based on genetic algorithms and GKNN(GA-GKNN)for the classification of high-dimensional data.The method is mainly divided into two stages: pre imputation and secondary imputation.In the pre imputation stage,genetic algorithm is used to select features from the original dataset,iterating continuously using the classification accuracy and feature loss of the SVM classifier as the fitness function to select the most important features,greatly reducing the calculation time of secondary imputation;In the second imputation stage,genetic algorithm is used to adaptively optimize the parameter K value,which solves the problem of determining the K value.Experiments on different UCI datasets have verified the effectiveness of the hybrid imputation method.Compared with traditional KNN imputation methods and GKNN imputation methods,the imputation performance of this method is the best.(3)Aiming at the missing problem of UCI datasets and actual tunnel health monitoring data,this thesis proposes a machine learning imputation method based on the idea of regression and classification,compares the imputation performance of different machine learning imputation models,and explores the impact of feature importance on the imputation results of missing data.The experimental results show that the RF ensemble model can achieve the best imputation performance.The machine learning method has better imputation performance when the imputation order is obtained according to the number of missing values or weighted sum of them and the importance of the permutation features.
Keywords/Search Tags:Missing Data, Machine Learning, Geotechnical Engineering, Genetic Algorithm, Feature Importance
PDF Full Text Request
Related items