Font Size: a A A

Research On Multilingual Tags Clustering And Visualization

Posted on:2016-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:X X GuFull Text:PDF
GTID:2308330461478143Subject:Library and Information Science
Abstract/Summary:PDF Full Text Request
Collaborative tagging system, also known as folksonomy, is a collection of annotations of interrelated users, resources and tags, which based on that a user can freely choose multiple tags to describe the web resources.In general, social tags refers to the tags marked by users, but there are some websites offer automated recommendation function to extract the tags(defined herein as the machine-generated tags). Currently, the research on the combination of content and social attributes of tags are still not deep enough. The personalization and fuzzy features of tags have made the social tagging system existing a lot of useless, redundant and semantically ambiguous tags. And the applications of tags using in Web text clustering are mostly monolingual Web text clustering and tag is only seen as a supplemet. To solve the above problems, this paper set clustering and visualization of multiple languages tags as a research goal, and studied from the aspects of social tags extraction, clustering, visualization and using in text clustering. Specific studies included the following three aspects:Firstly, in the study of social tags’ attributes, this paper categorized the tags into two classifications, tags marked by users and tags extracted by machines. The clustering results’ of the combinations of two type of tags attrributions (content and social) were analyzed and discussed. Experimental results showed that the combination of content and social attributions in the case of user classification could improve the clustering results and meet users’need of personalization of tag clustering results.Secondly, in the study of multi-language tags clustering and visualization, this paper use a more comprehensive feature extraction method, namely to improve the quality of extrated tags through the way of combining content and social attributes, with tags marked by usres, optimizing the final results of tag clustering, mapping the multi-language tags and realizing its visualization. Experimental results showed that in two methods of clustering multi-language tags for parallel corpus, single-language tags’secondary clustering result is better than mixed tags’ clustering result. The multi-language tags clustering results of paralle corpus is better than the results of comparable corpus based on bilingual dictionary mapping.Finally, in the application of social tags, this article focused on the shortcomings of traditional text clustering, and introduced social tags into the text clustering to compare content-based and tag-based and different weighting algorithms to analyze differences in multi-language texts clustering. Experimental results showed that the different selection of feature extraction methods and different weighting methods causing different performance in text clustering results. In the web text clustering, the combination of content and social attributes could play to improve the result of text clustering, and social attribution shold be paid more attention in the field of web text clustering. Another method is used by machine translation and secondary clustering to get the result of multi-language text clustering.By studying these three aspects, this paper implemented the basic function of multi-language social tags’s clustering and visualization, with a reference value for the study and application of clustering tags on multi-language websites.
Keywords/Search Tags:Social Tags, Tags Extraction, Tags Clustering, Visualization of Clustering Resutls, Web Documents Clustering
PDF Full Text Request
Related items