Font Size: a A A

Personal Search Topic Analysis Based On Web Search Query Logs

Posted on:2019-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:2428330545995928Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the internet,the usage of search engine is getting higher and higher,web search query logs also show an explosive growth trend.Web search query logs contain a lot of valuable information,such as search topics.Search topics play an important role in optimizing search engines and analyzing user behavior.Most of the existing research work on search topics mainly focus on the common characteristics of all the search engine users.However,it is rarely studied that how to design a model supporting the unique characteristics of individual user.In addition,the existing methods for segmenting the query session are not accurate enough to meet the requirements of the search query analysis model,such as the search topic model.In order to solve these problems,we make further research and exploration based on the previous studies.As for the inaccuracy of the session partitioning results,we extract the time interval of the session,the semantic similarity of the query items and the addition and subtraction words between the query items as feature attributes and adopt Naive Bayesian method to perform high-precision session partition on the web search query logs.Furthermore,in order to analyze the personal search topics,we combine the characteristics of web search query logs and the phenomenon of burstiness and propose two personal search topic models: Topic Independence Model(TIM)and Topic Dependence Model(TDM),we also utilize Beta distribution to escribe the trend of topic.The innovations in this paper are as follows:Firstly,we propose a search session partition method based on the naive Bayesian,which can divide the network query log session with high accuracy.This method transforms the session segmentation problem into the problem of judging whether the query terms are session boundaries,and then uses the naive Bayes algorithm to classify.The characteristics of the query items have the following three attributes: the time interval of the session,the semantic similarity of the query items and the addition and subtraction words between the query items.In order to improve the reliability of the feature attributes of query items,we propose the Query2 Vector model,which utilizes the word embedding in deep learning to calculate the semantic similarity of query items.The query terms are expressed by vectors,and the cosine similarity is calculated.And the experiments also prove that the session partitioning method proposed in this paper has more advantages than the commonly used methods.Secondly,we create a new model to achieve the difference of personal search topics,by studying the phenomenon of word suddenness in natural language processing and combining with the search topic model achievements.This model is based on query words and URLs in the web search log burstiness.In this paper,the web search query logs are divided into different documents by user id.As a result,the burstiness phenomenon can be captured in different documents,so as to reflect the search topic difference of different users.In this paper,we construct two search topic models,Topic Independence Model(TIM),which adopts an assumption that the generation of query terms and URLs are topically independent,and Topic Dependence Model(TDM),which holds the conception that there exists topic coupling between queries and URLs.We also describe the trend of topic through the Beta distribution.And then,the generation process of models,the derivation method and the parameter estimation method of the model are given.Finally,the experimental results also show that the proposed search topic model can effectively find the differences between the personal search topics,and they also have obvious generalization performance advantages compared with other search topic models.
Keywords/Search Tags:search topic model, session partition, burstiness, web search query logs, word embedding
PDF Full Text Request
Related items