Font Size: a A A

Research On Network Hotspot Detection In Web2.0and Personalized Information Retrieval

Posted on:2013-06-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:M LuFull Text:PDF
GTID:1228330377951885Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid progress of Web2.0technique, plenty of famous websites have emerged in recent years which changed the whole Internet. It is emphasized in Web2.0websites that users can talk and participate freely. Billions of people have created stupendous amount of information on this new platform, which makes it harder for people to find what they are really interested in. As a result, information retrieval and search engine technique has attracted a lot of attention and earned quite a success.Search engine plays an important role in Web infonnation retrieval systems nowadays. However, it still has some defects:1)The contents of Web2.0websites occupy a little percentage,2) Current popular information and hot topics cannot be reflected in the returned results,3) Ranking and filtering of the search results have no relations to user interest. This paper attempts to solve the problems on how to help people find the hotspot they are really interested in the ocean of Web2.0information.This paper covers the topics of hotspot detection in Web2.0social network and personalized recommendation for better user experience. To achieve these goals, this paper first proposes the research framework. After that, we discuss the key techniques of each important part of this system; besides, improved algorithms and models are proposed according to the features of Web2.0. The main content and innovations include:1. Considering the characteristic of information organization and hierarchy structure of Web2.0websites, we create an object-oriented self-adapting distributed real-time vertical crawler, which can synchronize with the real-time data while occupying a relative small bandwidth. The efficiency of crawler and the speed of information collection have been improved a lot.2. After a sufficient research of data structure and the feature of content tagging of Web2.0websites, we develop a unified tag-based information expression model by combining the traditional Web Object Extract algorithms with VSM model and name entity detection algorithms. In this model, we describe web ontology such as pages, images, videos and blogs with several weighted tags and vectors.3. Based on the tag-based unified information expression model, we improved existing TDT algorithms. It can detect topics with less computation cost. We design an effective topic popularity estimation algorithm (HotRank), which consider the impact of user feedbacks to information popularity. We use HotRank to calculate the popularity of topics we collected for further ranking and recommendation.4. Aiming at the defects of current user interest models, we set up a topic-based online user interest model. It can automatically extract the topics of web pages which users visited, and update itself with very little cost whenever necessary based on the variation of user interest. This model can be applied to many kinds of individual services. Experiments have been proved that personalized recommendation system based on this model achieved good performance.
Keywords/Search Tags:Web IR, TDT, Crawler, Personalized Recommendation
PDF Full Text Request
Related items