Font Size: a A A

Research On Key Technologies Of MicroBlog Search

Posted on:2015-01-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J DuanFull Text:PDF
GTID:1268330428984366Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Microblog becomes a very important source of real-time information quickly. There are two key problems in Microblog search:relevance ranking of tweets given a search query, and search result reorganization. Relevance ranking estimates the similarity be-tween tweets and query based on content and semantic correspondence. Search result reorganization overcomes the redundancy and informal written style of tweets and re-structures the tweet list in some concise order. In this thesis, I have studied a few im-portant problems:relevance computation, classification and summarization of search results, and comparative summarization of contrastive queries.To compute the relevance of tweets given a search query, this thesis proposes t-wo models based on Learning to Rank and Recurrent Neural Network based Language Model (RNNLM). Experiments demonstrate that the former approach outperforms the popular Twitter ranking approaches utilized by existing web services. And the later reduces the semantic gap between the query and tweets, and increases the coverage of relevant search result. Learning to rank based model evaluates the influence of the features about content relevance of a tweet, Microblog written characteristics, and us-er authority in relevance computation. Integrated with traditional language models, RNNLM introduces semantic similarity into relevance computation, but estimates the language model probability in a granularity smaller than word in the given context.To classify the search results, this thesis proposes a collective classification algo-rithm integrated relationships among tweets, and defines a classification taxonomy for topics in Microblog. The algorithm outperforms the featured-based classification mod-el by5.38%and4.74%on accuracy and f-score respectively. It collectively conducts the classification by exploiting the context information (i.e. related tweets) and con-sidering local features and relationships among tweets simultaneously to diminish the influence of data sparseness. The experimental results demonstrate that the proposed approach significantly improves the performance with respect to precision and recall, while the Iterative Classification Algorithm (ICA) using the relationships of sharing the same#hashtag gives the best results.To summarize the search results, this thesis proposes a timeline-based unified mu- tual reinforcement summarization model. The algorithm outperforms the graph-based baseline model by14%on ROUGE-1averagely. The search results of a given query are summarized by sub-topics along timeline to fully capture the rapid topic evolution in Microblog. The social influence of users and content quality of tweets has been taken into consideration simultaneously in a mutually reinforcing manner in tweet salience es-timation. Specifically, we rank and select salient and diversified tweets as the summary of each sub-topic. The experimental results show that the content quality of tweets and social influence of users effectively improve the performance of measuring the salience of tweets.To compare the search result of comparative topic, this thesis proposes an opti-mization framework based algorithm integrated with relationships among tweets. It achieves a14.7%improvements on the coverage of comparative aspect and11.6%im-provements on the precision of comparative tweet pair. The algorithm fully utilizes the similarity relationship between tweets and three types of Microblog—specific relation-ships among tweets in a graph-based algorithm to estimate the representativeness and comparativeness of the tweet pairs with PageRank and SimRank algorithm, formalize the task as a ranking problem to select a fixed number of tweet pairs as the summary that maximizes the comparativeness across queries yet best represents the respective queries, and summarize the commonalities and differences of search results for two comparable queries.
Keywords/Search Tags:MicrBlog Search, Relevance Computation, Search Result Reorganiza-tion, Ranking, Classification, Summarization
PDF Full Text Request
Related items