Font Size: a A A

Research On Query Correction Method Based On Multiple Characteristics Mining

Posted on:2017-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:X L GuanFull Text:PDF
GTID:2308330482490750Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The query string of error correction function of search engine is important to improving the retrieval efficiency and improving the user experience.The function of error correction is analysis query string that user submit to search engine; if the query string have error, the engine will given another form that similar with query string and returns a large number of result of user satisfied, thereby to improve the usability and fault tolerance of search engine, improving the user search experience.Currently, There are two commons methods to query the error correction method for Chinese search engine:dictionary-based approach and text-based information statistical method. Not only Dictionary-based approach does not consider query string context information, the correction strategies of text-based information statistical method is too single,and In the era of big data, error detection, error correction does not take into account the massive search engine logging analysis, mining great value released by logs.In order to solve this problem, this paper build query error correction model using search engine query logs as corpus and combine statistics and feature information of the query string.Mining and analysis the search engine logs so that optimizing parameter of query error correction model.First part, discover correction model based on a combination of statistics and characteristics. By establishing entries candidate to each word of query string, getting query string candidate.Combining Structural features and statistical features of query string, including N-gram model, click frequency, words shape similarity, levenshtein distance, to build confusion set ranking model. By this model select best entry from confusion set and compared with original string, achieving the purpose of correction.Second part, Bad Case Mining model is supplement and optimize to the correction model. By analysis search engine logs to mining correction process Bad Case. Mathematical model and let it automatic mining this Bad Case. By this Bad Case optimizing correction model parameters so that improve the precision and recall rates.This paper have two Innovation:Proposed a correction model based on multiple Characteristics. This model composite considering query string structural features and statistical features such as N-gram model, click frequency, words shape similarity, levenshtein distance, improving the precision and recall rates.Proposed a Bad Case mining model. By analysis search engine logs to mining correction process Bad Case optimizing correction model parameters so that improve the precision and recall rates.The experiments indicate the model have good effects in query retrieves. The accurate rate and recall rate can up to 92.2%and 95%when testing set is 110k. Compared with N-gram model, it increases by 13.6%and 8.3%.Improving the precision and recall rates and the user search experience.
Keywords/Search Tags:Query correction, Confusion sets, N-gram model, Bad Case Mining
PDF Full Text Request
Related items