Font Size: a A A

Personalized Information Retrieval Analysis And Modeling Based On Sogou Log

Posted on:2011-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:C SongFull Text:PDF
GTID:2178330338479967Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development, web resources growth rapidly, which make users very inconvenience. The search engines have come. However Different backgrounds, different purpose, and different periods of user's information needs are often different. But, search engines always out the same results for different users. It's obviously not very good to meet the information need of users. Personalized information retrieval (PIR) is considered as an important technology to solve the problem. PIR out retrieval results according to user's interest and can well satisfy the user's information needs. In this paper, we focus on the following three aspects of research. Main contents of this paper:1. Personalized potential. Different users always have different information needs, even for the same query. We consider the difference of users'needs as personalized potential of a query. In this paper, we apply Kappa to measure personalized potential of query and analysis the distribution of personalized potential of queries. Experiment results show that Kappa is a good measurement of query's personalized potential and the majority of queries'personalized potential is large which tell us the urgency to engage in PIR research.2. Experimental data processing algorithm. The large obstacle of PIR is the lack of a real and effective data. In this paper, we get the experiment data from web according the clickthrough data of Sogou search engineer. Although web resource is rich, it's full of lots of spam. So, we must filter out these spam messages. In this paper, we present active learning algorithm and Co-training algorithm to process web data. Experiments show that Co-training algorithm based on rule and logistic regression not only can get the best performance but also can reduce the human workload. We applied this method to data process and get a data set to support PIR.3. Personalized information retrieval modeling based on online learning. User's interest will always change over time. In this paper we propose online learning algorithm to track user's interest in time. Once the user's interest change, the online algorithm can learn this change according to user's clicks, which can ensure the user's interest, is the newest and can represent the user's current information needs. In this paper, we present two PIR models based on online logistic regression and one PIR model based on online SVM. Experiments show that all the three methods can improve the retrieval performance.
Keywords/Search Tags:Personalized Information Retrieval, Online learning, Logistic regression, SVM, Co-training
PDF Full Text Request
Related items