Research On Query Correction Method Based On Multiple Characteristics Mining

Posted on:2017-05-21

Degree:Master

Type:Thesis

Country:China

Candidate:X L Guan

Full Text:PDF

GTID:2308330482490750

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The query string of error correction function of search engine is important to improving the retrieval efficiency and improving the user experience.The function of error correction is analysis query string that user submit to search engine; if the query string have error, the engine will given another form that similar with query string and returns a large number of result of user satisfied, thereby to improve the usability and fault tolerance of search engine, improving the user search experience.Currently, There are two commons methods to query the error correction method for Chinese search engine:dictionary-based approach and text-based information statistical method. Not only Dictionary-based approach does not consider query string context information, the correction strategies of text-based information statistical method is too single,and In the era of big data, error detection, error correction does not take into account the massive search engine logging analysis, mining great value released by logs.In order to solve this problem, this paper build query error correction model using search engine query logs as corpus and combine statistics and feature information of the query string.Mining and analysis the search engine logs so that optimizing parameter of query error correction model.First part, discover correction model based on a combination of statistics and characteristics. By establishing entries candidate to each word of query string, getting query string candidate.Combining Structural features and statistical features of query string, including N-gram model, click frequency, words shape similarity, levenshtein distance, to build confusion set ranking model. By this model select best entry from confusion set and compared with original string, achieving the purpose of correction.Second part, Bad Case Mining model is supplement and optimize to the correction model. By analysis search engine logs to mining correction process Bad Case. Mathematical model and let it automatic mining this Bad Case. By this Bad Case optimizing correction model parameters so that improve the precision and recall rates.This paper have two Innovation:Proposed a correction model based on multiple Characteristics. This model composite considering query string structural features and statistical features such as N-gram model, click frequency, words shape similarity, levenshtein distance, improving the precision and recall rates.Proposed a Bad Case mining model. By analysis search engine logs to mining correction process Bad Case optimizing correction model parameters so that improve the precision and recall rates.The experiments indicate the model have good effects in query retrieves. The accurate rate and recall rate can up to 92.2%and 95%when testing set is 110k. Compared with N-gram model, it increases by 13.6%and 8.3%.Improving the precision and recall rates and the user search experience.

Keywords/Search Tags:

Query correction, Confusion sets, N-gram model, Bad Case Mining

PDF Full Text Request

Related items

1	Research On Chinese Text Real-Word Error Automatic Detection And Correction Algorithm
2	Search Engine Error Correction Algorithm And Error Correction Bad Case Mining
3	Research On Word Error Correction Methods Of Chinese Text
4	Research On The Key Technology Of Search Engine Query Error Correction
5	Research On Text Proofreading Method Based On The Analysis Of The Mongolian Syllable
6	Incorporating Confusion Set Knowledge In Chinese Grammar Error Correction
7	Research On Chinese Real-word Error Automatic Detection And Correction
8	Research And Implementation Of Grammar Error Correction Model Based On Deep Learning
9	Grammatical Error Correction Based On N-Gram Model And Parsing
10	Research On Error Handling Of The Speech Recogition Post-Processing