Font Size: a A A

Tag Clustering Method In Social Annotation Environment

Posted on:2017-02-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Z LiFull Text:PDF
GTID:1108330488985169Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Social tagging, is a mechanism used to search, organize, manage, and share network resources on the Internet. It allows web users to choose proper tags to annotate interesting web resources, in terms of their own understanding and preferences on these resources in a free and open environment. Therefore, social tagging systems produce a large amount of annotation information among users, resources, and tags.As social tagging involves the characteristic of freedom and openness, the annotation information generated in different times and backgrounds will bring the problems of the tag’s semantic fuzziness, ambiguity, sparsity, and redundancy. These problems lead to the confusion of the tag’s organization, the inconsistency of the tag’s information description, and will restrict the social tagging system’s application. To solve these problems, we can use tag clustering to reveal the tag’s inner coherence and aggregation. Moreover, we can discover the latent common information, concept, and knowledge. In addition, we can use tag clustering to re-organize and re-apply these tags. Therefore, in this paper, the annotation information is viewed as a fundamental data resource, and other relevant information, such as resource contents in social tagging systems are viewed as other expanded data resource. In this dissertation, we will study some related methods of tag clustering, and these methods can offer a well foundation for other studies and applications related to tag clustering. Our contributions are as follows.(1) We propose a tag spectral clustering method based on the tag’s common co-occurrence group similarity. To handle the problem of the tag’s semantic information missing in the current tag similarity measurement methods, the tag’s common co-occurrence group similarity is proposed, which integrally utilizes the ternary annotation information from a global perspective to measure the tag’s semantic similarity. To alleviate the problem of the tag’s data distribution irregular caused by annotation data space’s structure complexity, a tag spectral clustering method based on the tag’s common co-occurrence group similarity is presented. The proposed method uses the advantages of spectral clustering, i.e., spectral clustering could deal with the arbitrary distribution of data, and the process of clustering could be converged to global optimal solution. Experimental results demonstrate that the proposed method has a better clustering performance than many clustering methods.(2) We propose a tag clustering method based on LDA (Latent Dirichlet Allocation) model. In ternary annotation data, the semantic relations between the user’s annotating information and the resource’s annotated information are correlation, overlap, and discrepancy. Thus, to integrally reveal the hidden semantic structure in tags, the tag clustering method is studied from the latent topic perspective. The tag’s comprehensive LDA topic model and its clustering method are proposed. First, the ternary annotation relation among users, resources and tags is segmented into two binary relations, one is the binary relation between users and tags, and the other is the binary relation between resources and tags. Second, from the two binary relations, the user based tag topic learning model and the resource based tag topic learning model are constructed, respectively. The re-learning model of tag topic is correspondingly constructed by compositing the tag’s probability distribution results on user based tag latent topics and resource based tag latent topics. Finally, the mixture topics of tags are generated by iterative learning and the clusters are decided according to the tag’s mixture topics. In terms of the above steps, this proposed method realizes segmentation, reconstruction and clustering recognition of tags’overall semantic. Experimental results show that the tag comprehensive clustering method based on LDA could aggregate tags and it performs better than other tag clustering methods.(3) We propose tag clustering methods integrating contents and link analyses. The multi-relevant information fusion in social annotation environment is helpful to improve the tag’s topic identification ability and clustering quality. Based on this assumption, we first propose a tag LDA model by fusing user’s social relations and its tag clustering method, a joint LDA model of content and tag fusing resource’s referenced relations and its tag clustering method. Based on the two LDA models, we then propose a tag comprehensive LDA model by integrating contents and relations, and its tag clustering method. First, the tag LDA model based on user-weighted is constructed to realize the tag’s clustering by modeling the user’s social relations. Second, the joint LDA model of word and tag based on resource-weighted is built to aggregate tags by modeling the resource’s referenced relations. Third, the tag’s latent topics based on user-weighted and the tag’s latent topics based on resource-weighted are acquired firstly by iterative learning. Then, the re-learning model of tag topic is constructed based on the tag’s latent topics mentioned above. Finally, the mixture topics of tags are generated to realize the tag’s clustering. Experimental results show that our three proposed tag clustering methods have a better clustering performance than other tag clustering methods in their respective applicable areas.
Keywords/Search Tags:Social Tagging System, Tag Clustering, Spectral Clustering, Topic Model, Random Walk
PDF Full Text Request
Related items