Font Size: a A A

Research On Graph-based Semi-supervised Learning Algorithm Based On Binary Similarity Measure

Posted on:2022-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:J Z MiaoFull Text:PDF
GTID:2518306317493964Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
One of the important tasks of data analysis is to make category prediction for samples,which requires enough label data containing the category information to train the learner.However,marking the data requires huge manpower and material resources,which greatly increases the acquisition cost of labeled data.On the contrary,the acquisition of unlabeled data is relatively simple,and a large amount of unlabeled data can be collected through some simple information tools.Unfortunately,using only unlabeled data can lead to imprecise data classification problems.Therefore,semi-supervised learning is proposed to reduce the cost of data acquisition and improve the accuracy of data classification by introducing a large amount of unlabeled data into a small amount of labeled data.Among many semi-supervised learning methods,graph-based semi-supervised learning method is a representative one.Because it can use strict mathematical language to transform the learning task into a convex optimization problem,and then obtain the optimal solution,it has been widely concerned by scholars in recent years,and many effective graph-based semi-supervised learning algorithms have been proposed.These methods divide the learning process into two steps: similarity measurement between samples and label propagation.The research objectives of semi-supervised learning in graphs are mainly focused on two points: one is to accurately measure the similarity between samples to improve the accuracy of label propagation;The second is to effectively reduce the demand of the learning algorithm for labeled data.For these two purposes,most graph-based semisupervised learning has four deficiencies,namely,insufficient use of label information,fixed measurement form of distance between samples,failure to make use of intermediate results,and lack of describing similarity between samples from the perspective of attribute column.Focusing on the deficiencies of these four aspects,this paper uses labels to improve the distance measurement between sample instances and constructs probability dependence relationship between attributes to measure the similarity between different dimensions of data space.Based on these two aspects,it carries out the research of graph semi-supervised learning algorithm based on binary similarity measurement.Specific innovation work is divided into:Firstly,in view of the shortcomings of the above three aspects,this paper proposed the Semi-Supervised Learning Algorithm of Graph Based on Label-Based Metric Learning,which made full use of the small amount of label information in the data and the intermediate results of the label propagation process to update and optimize the measurement method among samples.Based on the local hypothesis in semi-supervised learning,the algorithm uses Mahalanobis distance to measure the similarity of samples,so as to describe the relationship between samples more accurately.Meanwhile,in the process of label propagation,information entropy is introduced to make the algorithm use the intermediate results of label propagation effectively,thus reducing the demand of the learning method for the initial labeled data.Experimental results on six real data sets show that the proposed algorithm achieves higher classification accuracy than three traditional graph-based semi-supervised learning algorithms in more than 95% of cases.Secondly,in view of the similarity measure between the properties of the data space,this paper proposes a relationship between the properties of the probability based on Bayesian Network generation algorithm(BIC-based Node Order Learning for Improving Bayesian Network Structure Learning),first in pairs to find the strongest relationship dependence of nodes to form the undirected connected graph Structure,and then after the V-structure identification to edges in the graph structure directional get base Bayesian Networks,on the basis of the Node topology sequence,to provide inaccessible constraints for the subsequent learning of network structure.The algorithm aims to provide probabilistic similarity information about each dimension of data space for the subsequent label propagation process,so as to effectively improve the learning performance.The simulation experiments on 9 kinds of Bayesian Networks established by experts of different scales verify that the algorithm can identify the probability dependence relationship between attributes with high accuracy.Thirdly,a graph semi-supervised learning algorithm based on label measurement and the probability relationship between attributes is proposed to measure binary similarity from both sample instances and attribute column.Firstly,the stability of the sample is characterized by the method of cluster ensemble.Then,the two similarity measures,instance distance between samples and probability relationship between attributes,were organically weighted and fused.The algorithm obtains complete binary similarity information of data,which provides more accurate similarity information between samples for graph semi-supervised learning,and improves the classification accuracy of the algorithm.By comparing with four semi-supervised learning algorithms on nine real data,it is proved that the proposed algorithm can enhance the classification performance of graph semi-supervised learning.
Keywords/Search Tags:machine learning, graph-based semi-supervised learning, metric learning, binary similarity, Bayesian networks
PDF Full Text Request
Related items