Font Size: a A A

Query Correction System For Search Engine Based On Statistical Model

Posted on:2018-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:J ShenFull Text:PDF
GTID:2348330536960855Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Search engine is one of the most commonly used tools in people’s daily life.Not accurate input or query containing errors is usually found in the process of using search engine.Various types of query errors can be found in user’s input by analyzing search log.The percentage of error query in total query is more than 10%.Query correction in search engine can provide the correct query that users want actually through the processing of error correction,thus getting the correct and relevant results.So a good method of query correction in search engine can effectively enhance user’s experience,and improve search engine’s own fault tolerance and ease of use.In order to deal with the common types of query errors and improve the accuracy of error correction,this paper does a research of correction method firstly based on Hidden Markov model and N-gram model,and makes a detail description of the process of constructing and training the N-gram model.A large amount of character frequency and word frequency information obtained from the log data is added into N-gram language statistical model.Different from other single language models,this paper solves the problem by transferring the candidate set filtering problem into finding the hidden state sequence with Hidden Markov model.The Viterbi algorithm is used to obtain the optimal candidate,and the final error correction results are chosen after the comparison with the original query and other query results.Secondly,query log in search engine is used as data set for query correction and model training,as well as the experiments.Query correction method using the log as training data is also easier to transplant for different areas of search engine.Finally,the query error types are classified by analyzing the user searching data.This paper also has a full analysis of the characteristics of the traditional dictionary set matching method and the edit distance method,together with statistical model,and then forms a complete query correction method by the integration of the statistical model proposed in this paper and various modules in an appropriate way,and based on this method to achieve the query correction system.Experiments show that the query correction system implement in this paper has achieved a good error correction effect.It can deal with common query errors in search engine,which gains a high rate in both the accuracy and the recall.
Keywords/Search Tags:Search Engine, Query Correction, Web Log, Statistical model
PDF Full Text Request
Related items