Font Size: a A A

Research On Approximate Granularity Feature Selection And Classification Methods For High-Dimensional Data

Posted on:2021-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiFull Text:PDF
GTID:2428330602489061Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the context of the rapid development of information technology,a large amount of unstructured data has been generated in various application fields.These data have brought many new challenges to traditional machine learning methods.One of them is the"Dimensional Disaster".The large amount of redundant information in high-dimensional data will reduce the computational efficiency of machine learning methods,and usually affect the accuracy of conclusions.In order to effectively remove the redundant information in high-dimensional data,this paper uses a statically bound framework local sensitive hash algorithm to describe the granular structure in the high-dimensional data space.Since the data similarity relationship obtained by the local sensitive hash algorithm varies depending on a given probability parameter,compared with the traditional granular computing methods such as rough set,the data granularity structure obtained in this paper is an approximate data division result.But this approximate granularity structure solves the problem that the traditional granular computing model requires a lot of computing time when processing high-dimensional data.Based on this approximate granularity partition structure,this paper draws on the concept of traditional rough set dependency and designs an approximate feature selection algorithm based on rough sets and locally sensitive hashes.In addition,in many related algorithms based on the local sensitive hash algorithm,in order to ensure a sufficient effect and save computing resources,a large number of hash functions are generated in advance so that they can be used in the subsequent calculation process.This article also uses this technique to further consider the characteristics that exist between the data and the generated hash function after approximate feature selection.The rough set feature selection algorithm is used to select the local sensitive hash function that is more suitable for classification,and by combining The new bucketing method and the basic idea of the dynamic collision framework propose a classification algorithm based on rough sets and local sensitive hashing.Experiments show that the two algorithms of approximate feature selection and classification have a good effect on the processing of high-dimensional data.
Keywords/Search Tags:Rough Set, Local Sensitive Hashing, Feature Selection, Classification, High-Dimensional Data
PDF Full Text Request
Related items