Research On Approximate Granularity Feature Selection And Classification Methods For High-Dimensional Data

Posted on:2021-03-10

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Li

Full Text:PDF

GTID:2428330602489061

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In the context of the rapid development of information technology,a large amount of unstructured data has been generated in various application fields.These data have brought many new challenges to traditional machine learning methods.One of them is the"Dimensional Disaster".The large amount of redundant information in high-dimensional data will reduce the computational efficiency of machine learning methods,and usually affect the accuracy of conclusions.In order to effectively remove the redundant information in high-dimensional data,this paper uses a statically bound framework local sensitive hash algorithm to describe the granular structure in the high-dimensional data space.Since the data similarity relationship obtained by the local sensitive hash algorithm varies depending on a given probability parameter,compared with the traditional granular computing methods such as rough set,the data granularity structure obtained in this paper is an approximate data division result.But this approximate granularity structure solves the problem that the traditional granular computing model requires a lot of computing time when processing high-dimensional data.Based on this approximate granularity partition structure,this paper draws on the concept of traditional rough set dependency and designs an approximate feature selection algorithm based on rough sets and locally sensitive hashes.In addition,in many related algorithms based on the local sensitive hash algorithm,in order to ensure a sufficient effect and save computing resources,a large number of hash functions are generated in advance so that they can be used in the subsequent calculation process.This article also uses this technique to further consider the characteristics that exist between the data and the generated hash function after approximate feature selection.The rough set feature selection algorithm is used to select the local sensitive hash function that is more suitable for classification,and by combining The new bucketing method and the basic idea of the dynamic collision framework propose a classification algorithm based on rough sets and local sensitive hashing.Experiments show that the two algorithms of approximate feature selection and classification have a good effect on the processing of high-dimensional data.

Keywords/Search Tags:

Rough Set, Local Sensitive Hashing, Feature Selection, Classification, High-Dimensional Data

PDF Full Text Request

Related items

1	Research Of Ensemble Learning For High-dimensional And Imbalanced Data Classification
2	Hash-based Approximate Nearest Neighbor Search For High-dimensional Data
3	Research On Cost-sensitive Feature Selection Problem
4	A Study On Unsupervised Feature Selection Algorithms For High Dimensional Data
5	Research On High-dimensional Index In Large-scale Image Retrieval
6	Feature Selection Models And Methods Based On Information Measure For High Dimensional Data
7	High-dimensional Data Processing And Forecasting Based On Feature Learning
8	Cost Sensitive Feature Selection Based On Data Correlation
9	Research On Feature Selection Methods For High-Dimensional Classification
10	Research Of Approximate K-Nearest Neighbors Search Algorithm Based On Locality Sensitive Hashing