Font Size: a A A

A Semi-Supervised Feature Dimension Analysis Method Based On Entropy

Posted on:2022-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:M R YangFull Text:PDF
GTID:2518306533996019Subject:Mathematics
Abstract/Summary:PDF Full Text Request
In the classification problem,how to measure the difference between t-wo samples is a fundamental problem.For the Euclidean distance metric,it is generally considered that each feature plays the same role in determining the similarity between samples.It has nothing to do with the data distribution,and does not consider their importance in the classification,which is obviously inconsistent with the actual situation and also restricts the improvement of ma-chine learning algorithm performance.Therefore,how to find an appropriate way to measure the importance of sample features to sample classification has become a crucial issue of machine learning.Based on the basic principles of information theory and data science,this paper creatively proposes a measurement method based on semi-supervised learning that can effectively measure the importance of feature dimensions to sample classification.Since the importance measurement of feature dimension reflects the influence of feature dimension on the classification task,a distance form that can reflect this influence can be constructed according to this measure,and a new metric learning method can be proposed.At the same time,based on the importance measurement of feature dimension,a feature selection method is proposed,which can effectively improve the performance of the classification algorithm by retaining the feature dimension which has great influence on clas-sification and deleting the other feature dimensions which have less influence.Numerical experiments show that these methods are suitable for small sample problems and have good performance in high-dimensional data problems in-cluding hand-writing images,which can effectively improve the accuracy and computational efficiency of the classification algorithm,and have good robust-ness and computability.This paper creatively establishes a method to measure the importance of feature dimension to classification by using information entropy in a semi-supervised way for the first time,and provides a new metric learning method and feature selection method based on this measurement.Thus,it provides a new technology to solve the important issue of measuring feature dimension in data science.Compared with the traditional methods,this method considers the distribution information of both labeled samples and whole samples,and shows good applicability in small sample learning.It is an exploration of entropy the-ory in the application of data science,and has important theoretical basis and application value.
Keywords/Search Tags:information entropy, semi-supervised learning, distance metric, feature selection
PDF Full Text Request
Related items