Font Size: a A A

Research And Development Of Microblog Information Retrieval System

Posted on:2015-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:P X LinFull Text:PDF
GTID:2268330428472983Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the internet, the internet medium has become one of the most important ways for people to access the latest news. Among these, microblog has got more and more attention for its convenience and real-time since it was born. Every second millions of microblogs are being posted. It has been the key research filed for industry and academia that how to handle these big data.In the microblog network, the relationship between microblog users just likes people in the real world. They can form virtual communities. In the specific time period, people in the same community are closely linked to each other and they have common interests. We can get the microblog users related to the query word by getting the community interest. We can get the interested microblog users and interesting microblog by getting the user interests.In this paper, we propose a new algorithm to detect community in microblog network. Combing microblog community interest and microblog user interest, we can get the users related to query word and recommend the interested users and microblog. We can get the microblog users having the same interests with the query user by computing the level similarity between the models of different people’s interest. By computing the level similarity between the model of people’s interests model and the topic model of microblogs, we can get microblogs that the query user maybe has interest in. This paper uses the Lucene package to retrieve microblogs. On the base of query expansion, it can get better performance.This paper includes the following work:Firstly, propose a new algorithm named Label-Influence-Algorithm (LIA) to detect communities in microblog networks. The LIA considers the relationship between these people in the social network that if most of the people’s friends are in a community, we can believe that he can be in the same community to a great extent. The LIA algorithm also considers the influence of all users in microblog network on the base of Sociology academic. The number of microblog followers can not conform to the actual situation, and we must consider of the number of microblog users’ friends, comments and being@, too. In other words, we must find a method to delete the zombie fans in the corpus (Thos microblog operated by machine are just to add the number of fans). Secondly, model the microblog community users’ interest model. Because the length of microblog messages is limited to140, the traditional topic model cannot get the well performance in the short text. In the specific time period, a microblog user is focused on a fixed field, and the users in the same community are having the same talking topics. For each microblog user, we can get the user interest model by modeling the user-topic model. For each microblog community, we can get the community interest model by modeling the community-users-topic model.Thirdly, develop the microblog information retrieval system. The system mainly has two functions. In the query function, firstly, we can get the expansion set using the Hownet with the query word. And then, we can get the related microblog messages and users to the query expansion set. In the recommendation function, we can get the microblog users in the same interests with the people and the microblog messages the people may be interested in by computing the level of similarity between the people’s interest model and the other people in the same community, and also the microblog messages topic model.In the end, make the summary of this paper, and propose the next research field.
Keywords/Search Tags:microblog, information retrieval, interest model, community detection, recommendation
PDF Full Text Request
Related items