Font Size: a A A

Research On Interpolation Method Of Soil Missing Value

Posted on:2022-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZhangFull Text:PDF
GTID:2493306737476504Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Traditional methods of missing value in soil data are mainly limited to the field of soil science.Although the methods are more professional and accurate,they are not sufficiently considered for crossindustry research and utilization.In order to solve this problem,this paper introduces the methods of data mining,taking soil attributes p H and classification as examples to interpolate soil missing value.(1)For the problem of soil attribute p H missing data,this paper compares the interpolation effects of Multiple Regression,KNN,Random Forest,SVM,Neural Network and Multiple Imputation.The main work of this part is as follows:(a)Through a lot of training and testing work on soil dataset,this paper have optimized the best parameters of each method,and established the missing value interpolation model.(b) "#$,%" &$ and %! are used to evaluate the performance of each method on soil dataset with different missing rates.The result shows that,Both KNN and Random Forest with optimal parameters are the least affected by the missing rate of dataset,and the interpolation effects of those two methods are the best.(2)For the problem of soil attribute classification missing data,the main work of this part is as follows:(a)This paper constructs a mathematical model which can describe the missing problem of soil classification attribute,that is,the problem of discrete single attribute data missing(DSADM).(b)A general interpolation algorithm DSADM_HC for missing value of discrete attribute data based on hierarchical clustering is proposed to solve the problem of DSADM.(3)This paper apply DSADM_HC to interpolate the missing value of soil classification attribute.DSADM_HC consists of three parts: hierarchical clustering for missing attributes,cluster classification of discrete attributes,and cluster mapping of missing discrete attributes.On the stage of hierarchical clustering for missing attributes,this paper proposes a dimension number setting index based on dense sampling(DNSDS).DNSDS can assist the dimensionality reduction algorithms to determine the dimension of the dimension-reduced dataset.In the stage of cluster classification of discrete attributes,the cluster selection strategy of the best discrete attribute distribution based on K is proposed select the best cluster division result from the hierarchical tree.In this paper,DSADM_HC is applied to interpolate the missing values of soil classification attribute.The experiment shows that,(a)DSADM_HC is effective for missing values of soil classification attribute.(2)Using DNSDS to set the dimension number of dimension-reduced dataset can make DSADM_HC get the best effect of missing value interpolation of soil classification attribute.(c)Using ’()as cluster distance measurement in DSADM_HC can obtain the best interpolation effect of missing values of soil classification attributes,and the highest correct rate is 79.2%.
Keywords/Search Tags:Soil Attribute Data, Missing Data, Data Dimensionality Reduction, Distance Between Clusters, Hierarchical Clustering
PDF Full Text Request
Related items