A Semi-Supervised Feature Dimension Analysis Method Based On Entropy

Posted on:2022-05-02

Degree:Master

Type:Thesis

Country:China

Candidate:M R Yang

Full Text:PDF

GTID:2518306533996019

Subject:Mathematics

Abstract/Summary:

PDF Full Text Request

In the classification problem,how to measure the difference between t-wo samples is a fundamental problem.For the Euclidean distance metric,it is generally considered that each feature plays the same role in determining the similarity between samples.It has nothing to do with the data distribution,and does not consider their importance in the classification,which is obviously inconsistent with the actual situation and also restricts the improvement of ma-chine learning algorithm performance.Therefore,how to find an appropriate way to measure the importance of sample features to sample classification has become a crucial issue of machine learning.Based on the basic principles of information theory and data science,this paper creatively proposes a measurement method based on semi-supervised learning that can effectively measure the importance of feature dimensions to sample classification.Since the importance measurement of feature dimension reflects the influence of feature dimension on the classification task,a distance form that can reflect this influence can be constructed according to this measure,and a new metric learning method can be proposed.At the same time,based on the importance measurement of feature dimension,a feature selection method is proposed,which can effectively improve the performance of the classification algorithm by retaining the feature dimension which has great influence on clas-sification and deleting the other feature dimensions which have less influence.Numerical experiments show that these methods are suitable for small sample problems and have good performance in high-dimensional data problems in-cluding hand-writing images,which can effectively improve the accuracy and computational efficiency of the classification algorithm,and have good robust-ness and computability.This paper creatively establishes a method to measure the importance of feature dimension to classification by using information entropy in a semi-supervised way for the first time,and provides a new metric learning method and feature selection method based on this measurement.Thus,it provides a new technology to solve the important issue of measuring feature dimension in data science.Compared with the traditional methods,this method considers the distribution information of both labeled samples and whole samples,and shows good applicability in small sample learning.It is an exploration of entropy the-ory in the application of data science,and has important theoretical basis and application value.

Keywords/Search Tags:

information entropy, semi-supervised learning, distance metric, feature selection

PDF Full Text Request

Related items

1	Research On Adaptive Selection Of Distance Metric Functions In Semi-Supervised Classification
2	Research On Feature Selection And Semi-Supervised Classification
3	Research On Intrusion Detection Scheme Based On Semi-supervised And Feature Selection
4	Research On Multi-Label Learning Algorithms With Distance Metric Learning
5	The Research Of Facial Feature Extraction Method Based On Semi-supervised Learning
6	Semi-supervised Metric Learning Based Anchor Graph Hashing For Large Scale Image Retrieval
7	The Research On Manifold-based Semi-supervised Feature Selection Algorithms For Gene Selection
8	Coupled Distance Metric Learning Method Research And Its Application In Gait Recognition
9	Semi-supervised Active Learning For Relevance Feedback In Image Retrieval
10	Semi-supervised Feature Selection Based On Kernel Density Estimation