Font Size: a A A

Research On The Relevance Of Chinese Weibo Comments And Weibo Topics

Posted on:2017-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z L LiuFull Text:PDF
GTID:2358330488491678Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
As the representative social networking service of Web 2.0, microblog, has become a kind of main platform sharing and exchanging information, which led to the appearance of Micro Pomarketing. Micro Pomarketing was a kind of novel marketing way based on network. The merchants published the microblog posts related to their goods in microblog, and regarded the fans and idols as the transmission channels, to let the other users fast view the information of goods. Micro Pomarketing was a kind of Word-of-mouth marketing too, and the comments of users carried a lot of intelligence information, and had significance for merchants and users. For merchants,the more comments represented that more users focused on their goods, and the content of comments also could let the merchants find the disadvantages and advantages of goods. For users, learning the comment information of other users, could led them make the true purchase decisions. However, the microblog posts usually had many comments, and reading these comments in manual was time consuming and not realistic. Because the comments were free in microblog, and some comments were not related to the topic that the microblog posts represented, or some commentors did not know the topic and were new users of microblog, and their comments did not had authority. So, mining the comments that were related to the topic that microblog posts represented, and the commentors had preference for the topic and authority, had important reality significance. The main contents that this thesis reaearched were as follows:1?The modeling of comments of microblog posts. The traditional ways of text modeling, such as the Vector Space Model(VSM), represented the texts as vectors and used the Term Frequency–Inverse Document Frequency(TF-IDF) to calculate the weight of words of vectors. The comments and microblog posts were the short texts, and using the VSM to model the comments and microblog posts was not scientific. So, we used the Post Word Graph(PWG) to model the microblog posts and comments in this thesis. In PWG, the vertexes represented the words, and the edges represented the relationship among words.2?Extracting the keywords from the short text set. The freedom of comments of microblog posts led to that some words that were not related to the topic existed in comments. These words not only increased the calculation, but also had bad effect on the precision. Based on PWG, we proposed the Post Word Rank(PWR) to calculate the weights of vertexes(words) of PWG. The PWR value of a word reflected the importance of the word in PWG. So, these words with higher PWR were extracted as the keywords. These comments with more keywords could be related to the topic.3?We proposed the Chinese Short-Text Semantic Similarity Algorithm(CSSSA), and the CSSSA not only took the semantics of words into consideration, but also the Part-of-Speech(POS).4?In this thesis, we defined the Comment Related Score(CRS), and the CRS not only took the semantic similarity between the comments and the topic into consideration, but also the Topic Preferences of users(UTP) and the Authority Values of users(UAV).Finally, this thesis made the experiment in TENCENT, and used the precision, recall and F-Score as the evaluation criteria. The results of experiment proved that our method was effective.
Keywords/Search Tags:Microblog, topic, comment, short text, correlation
PDF Full Text Request
Related items