Query-doc Relation Mining Based On Markov Random Walk Model

Posted on:2015-03-27

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhu

Full Text:PDF

GTID:2268330428998001

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

The development of the World Wide Web has brought explosive growth ofinformation, peopleâ€™s daily life has been inseparable from the great era of searchengine product. So far, Google, Baidu and other general search engines after tenyears of development, has been quite perfect in function, performance and otheraspects of the search accuracy is also rising. Despite that these commercial searchengines have achieved great success, the relevance of search results needed to beimproved, most users still need several adjustments of search terms to find theinformation they really need, the lack of personalized search results. How can weexpand the recall of relevant search results, improve the relevance of search resultsand query and provide users with a more reasonable dynamic ranking search resultshas become one of the current problems to be solved.The relationship between queries and docs is a valuable type of information thatsearch engines hope to obtain. The exact correlation analysis between queries anddocs is not only helpful for ranking search results, but also important for building abridge between queries and docs to allow information transfer between relatedqueries and docs, which is beneficial for deep understanding of queries and docs, andfor carrying out a series of applications. This paper presents a query-doc relationmining algorithm based on user search behavior. Initially, we collect and analyzeusersâ€™ search log data to build a bipartite graph between queries and docs; after that,we model the bipartite data using Markov random walk model, and then mine theclick-through data and session data from the bi-partite graph. Eventually, we canobtain doc data that user did not click in the click-through data and predict theimplied relationship between queries and docs. Besides, we can also take advantageof the algorithm to get potential relationship between queries and queries. Motivation of this paper is to consider and apply the user click information,presents a relation mining algorithm based on user click log. The method can obtainthe doc data that user did not click in the click log by mining the click data andSession data of the click log and predict the implied relationship between queries anddocs. Besides, we can also take advantage of the algorithm to get potentialrelationship between queries and queries. The relationship between queries and docsis a valuable type of information that search engines hope to obtain. The exactcorrelation analysis between queries and docs is not only helpful for ranking searchresults, but also important for building a bridge between queries and docs to allowinformation transfer between related queries and docs, which is beneficial for deepunderstanding of queries and docs, and for carrying out a series of applications.The innovation of this paper is to consider the user click information and clickinformation fully integrated into the calculation model based on bipartite graph isused to calculate the correlation between query and query. The method proposed inthis paper complete the relevant calculations by considering the interaction history ofthe user and the search engine system. An important application of the algorithmproposed in this paper is query recommendation. Previous studies about queryrecommendation have accumulated a lot of methods, but we donâ€™t find a similarresearch content recommendation method. Recommendation process of this paper isto calculate the correlation of the different queries associated with same documents.Put another way, by mining different queries that user click the same docs, this time,the algorithm proposed in this paper considers these queries are relevance in order tocomplete the relevant query recommendations. In terms of retrieval sorting, thealgorithm proposed in this paper can calculate the implied relationship betweenquery and doc, this relationship can be directly used as a reference factor in themodel of Learning to Rank to achieve a more humane dynamic sorting.In summary, query-doc relation mining algorithm proposed in this paper not only has theoretical support, but also has important practical value. The relationshipbetween query and doc can be used as a reference factor in Learning to Rank model,at the same time, query-doc mining results also can be used directly as referencefactors of search engine ranking, we can consider the query-doc relationship basedon traditional sort of query-doc relationship, doc can adjust the weights associatedwith improved ranking if the doc has a lower ranking result but has outstandingperformance in query-doc mining. Therefore, the query-doc relation mining has veryimportant significance for dynamically sorting of relevant results and expandingrecall of search engine. At the same time, we construct a complete log data miningsystem, through a large number of experimental contrasts, the system showsoutstanding performance on many aspects. In the experimental part of this paper, weshow the performance contrast information of relevant results for the algorithmproposed in this paper, such as query-query result for query recommendation,query-query result for query cluster and query-doc result for the dynamically rankingof retrieval results. The result shows that it will increase relevance up to71.23%,which indicates that the theory and algorithms proposed in this paper can solve theproblem of mining implicit relationships between queries and docs effectively. Ourapproach provides a good basis for increasing recall of search results, optimizingquery recommendation and clustering retrieved results.

Keywords/Search Tags:

association relation, search behavior, Markov Random Walk Model, queryrecommendation, clustering of retrieved results

PDF Full Text Request

Related items

1	Research And Improve On Clustering Method Of The Search Engine's Retrieved Results
2	Research And Implementation Of Clustering Systems Of Web Search Results
3	Research Of Query Expansion And Search Results Clustering For Web Information Retrieval
4	Random Walk Learning On Graph
5	A Random Walk Based Win/Loss Graph Aggregation Algorithm For News Metasearch Engine
6	The Study On Web Search Results' Clustering
7	Research On Complex Network Clustering Algorithm Based On Random Walk
8	Extracting Opinion Relation From Web Text Based On Word Alignment Model
9	Research On Semantics-Based Search Results Clustering Methods
10	Chinese Search Results Clustering Research Based On Improved STC