Font Size: a A A

Hashtag Generation And Its Application Based On Multilingual Microblog

Posted on:2017-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:J ShaoFull Text:PDF
GTID:2308330488461133Subject:Information Science
Abstract/Summary:PDF Full Text Request
Hashtag is the topic labels that users always use in their microblogs, it can improve the efficiency of information orgnization and information retrieval on microblogs. Studies about basic characteristics, generation technology, clustering and classification of hashtag is important. However, most of the users seldom label hashtags for their microblogs. And the small quantity of avaliable hashtags affect its using performance significantly. There are few researchers paying attention on hashtags and proposing some generation technologies in order to solve the problem mentioned above which is also the main research topic in this paper.The idea of KNN is used for hashtags generation technique. For choosing experiment data, data of Sina microblog and Twitter are selected to be the corpora. We find out some microblogs from the corpora which are similar ones compared with the target microblogs and extract their hashtags. Three different documents representation methods are compared according to the hashtag generation results, they are latent semantic analysis, latent dirichlet allocation and deep learning.Currently, researches on hashtag clustering technique are limited. Different from long texts, microblogs are short texts while this kind of clustering technique needs to be futherly studied. In this paper, we compared two different hashtag clustering strategies, one is based on the matrix of document labels, and another one is based on the combined documents of Hashtag. Clustering algorithms of K-Means, affinity propagation and hierarchical clustering are all used in these two strategies, results of which are evaluated and compared. In the strategy based on combined documents, effects on clustering performance of three documents representation methods (latent semantic analysis and latent dirichlet allocation) are compared and the best method for document representation and clustering algorithm are found for hashtag clustering.Description of multilingual microblog Hashtag clustering result is able to extract keyphrases from cluster, descript the most important information of cluster. At present, basic methods for describing docment clustering results are automatic indexing, automatic summarization etc. and keywords extraction used in this paper is also one of automatic indexing techniques. In the current study of keywords extraction, features for keywords mainly are statistical features, few researches are related to grammar features. So this paper proposed the dependence relation features and syntactic features of keywords based on their characteristics. To verify these two features, support vector machine based and logistic regression based classifiers are used on the Chinese and English datasets. F-value is increased according to the results.In the part of hashtag applications, we select the best methods from above studies that we did, and use these techniques in event detection. Firstly, hashtags of microblogs without hashtags are generated, then keywords extraction is done on the hashtags clustering results. Visualized presentation of clustering results is displayed finally. Otherwise, consider into time of micrblogs, and analyze the entire corpus of h7n9, important clusters and important Hashtag.
Keywords/Search Tags:Hashtag, Social tag, Hashtag generation, Hashtag clustering, Keyword extraction
PDF Full Text Request
Related items