Font Size: a A A

Regionalized Feature Selection Methods For Spatial Data

Posted on:2022-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:S C LiuFull Text:PDF
GTID:2518306509465244Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The feature selection method refers to the method of selecting a subset of features that meet a certain standard according to a certain method from the initial feature set of the data.As a common data preprocessing method,feature selection is an important method in data dimensionality reduction.Data dimensionality reduction processing plays an important role in machine learning.It can effectively filter the noise in the data and improve the subsequent machine learning performance effectiveness.For spatial data,traditional feature selection methods do not consider the spatial position relationship between spatial data points.Generally,spatial data has specific spatial distribution characteristics,and spatial autocorrelation and spatial heterogeneity lead to a certain dependence of spatial objects,which will show the characteristics of homogeneous aggregation to a certain extent.However,the research on general data does not assume the distribution characteristics of data points,or assume that they are uniformly and randomly dispersed in space.The traditional feature selection method does not take this into consideration when selecting spatial data,nor can it effectively use the distribution mode of spatial data to select features that are more suitable for spatial data.Therefore,when using traditional methods to directly perform feature selection for spatial data,the result's ability to interpret the target cannot truly reflect the actual situation of geographic phenomena.In order to make full use of the spatial position relationship between data points in the feature selection process,this paper aims to propose a new feature selection method(RFSM)for spatial data.The purpose of this method is to explore the influence of the positional relationship between spatial data points on the feature selection of spatial data.Compared with the direct use of traditional feature selection methods,the RFSM method considers the unique distribution pattern of spatial data,and focuses on the impact of nearby data points in the feature selection process.The purpose of the experiment is to make full use of the spatial dependence of the data points in the region in the process of feature selection and ignore the influence of distant data points on the data of the region.The RFSM method will take the following steps.Firstly,a spatial adjacency matrix is established for each data point.Secondly,the data points in the area are selected by traditional methods.Finally,the results are synthesized and the machine learning classification algorithm is used to compare the results obtained by different algorithms.Experiments show that the RFSM framework is relatively stable.When the dimension of the feature subset of the selected spatial data is relatively small,the classification and prediction performance of the feature subset selected by the algorithm under the RFSM framework has a certain improvement compared with the original algorithm.In addition,this paper also designs and implements a Python-based spatial data regional feature selection system,which is oriented to spatial data and aims to use the positional relationship of spatial data for feature selection.The GUI framework chosen by the system is Tkinter and the GDAL database is used to manage vector data,which ensures that the system has good compatibility and portability.
Keywords/Search Tags:Spatial data, Feature selection, Spatial dependence, Spatial heterogeneity, Spatial autocorrelation
PDF Full Text Request
Related items