Font Size: a A A

Research On Label Clustering Algorithm Based On Similarity Of Multi-factor Labels

Posted on:2019-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:X L ZhangFull Text:PDF
GTID:2428330545454766Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The 21 st century is an era of information technology outbreaks.The emergence of a social tagging system has created a huge amount of information resources and tapped out tagging resources in social tagging systems,which can enhance the experience of socialized systems.In the process of mining potential useful information for labels,how to classify labels is a difficult problem.The use of tag clustering algorithm can achieve the social tagging system's classification requirements for tags.The social tagging system is an open system in which the user can label the resources at will,and the tagged tags can reflect the characteristics of the resources to some extent.However,due to its openness,users of all walks of life and various knowledge levels can label resources which lead to labeling problems such as inaccurate information description and fuzzy label semantics.And there is a greater negative impact in the classification effect of resources.At present,There are two problems in label clustering at this stage:(1)the calculation of label similarity is not accurate;(2)the clustering algorithm itself has some limitations,such as K-means randomly selecting the initial cluster center,which tends to cause cluster instability,low accuracy and other issues.In order to solve the above problems,this paper improves the label similarity calculation method and K-means algorithm,and proposes a label clustering algorithm based on the similarity of multi-factor labels.The main tasks are as follows:(1)This paper proposes a method to calculate the similarity of multi-factor tags.This method considers the user factor and resource factor and integrates the frequency and importance factor of the tag to improve the similarity of the tag.The similarity of the tag can better measure the similarity between the tags.(2)The integration density and distance are proposed K-means initial cluster center optimization method.This method continuously selects the most distant high-density object as the initial cluster center,avoids the problem of selecting the initial cluster center only from the farthest distance and cannot solve the problem of noise,and only selects high-density objects as the initial cluster center,which easily leads to local optimal problems.The initial cluster center chosen in this paper is closer to the actual cluster center,making the clustering algorithm more stable and with higher accuracy.Finally,the multi-factor tag similarity calculation method is combined with the K-means algorithm to optimize the initial cluster center to implement a complete tag clustering algorithm.The validity of the K-means initial cluster center optimization method and the validity of the tag clustering algorithm based on the multi-factor tag similarity of the multi-factor tag similarity,integration density and distance are validated experimentally.Experiments show that the similarity of the multi-factor tag based on the multi-factor in the traditional K-means clustering has a higher Purity,accuracy and summon rate,can better measure the similarity between the label relationships.The K-means clustering algorithm that integrates the density and distance in this paper has higher clustering accuracy and universal applicability.The label clustering algorithm based on the multi-factors similarity label fusion algorithm for label clustering works best.
Keywords/Search Tags:Social labeling, K-means, Label clustering, Similarity, Feature vector
PDF Full Text Request
Related items