Applications Of Spelling Correction Techniques In Information Retrieval And Text Processing

Posted on:2008-03-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhang

Full Text:PDF

GTID:2178360245991810

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Spelling correction is one of the hot spots in recent natural language processing research. As the pervasive applications of information retrieval and text processing, the spelling errors are unavoidable in the human-typed documents. The process of misspellings is a waste of time and money.After conducting a thorough survey on state-of-the-art spelling correction techniques, we compared the differences of its applications in web search and text processing, implementing systems for these two fields, respectively. Based on the analysis of large volume query log data, we found the misspellings share the most similar context with its most intended correction word; whereas its context is less similar with other candidates. We first employed the noisy channel model, with improvement in its component error model using distributional similarity based on this finding. Next we used distributional similarity as a feature in the discriminative maximum entropy model, with edit distance, phonetic similarity, and language model as other features. In the experimental results part we evaluated these two models.To correct the misspellings in text processing applications, we proposed a novel method which is based on discriminative reranking framework. For the first time we deduced the spelling correction as a ranking problem, rather than the traditional classification one. This method reranks the output of existing spelling corrector Aspell, using Ranking SVM. It employs cutting-edge spelling correction techniques as features, greatly improved its performance. It also outperformed several off-the-self spelling correctors, such as the one used in Microsoft Word 2003. To leverage the great cost on human annotation of training pair acquisition, we also presented a new method to automatically extract training pairs from web query log chain. The performance of model trained by query chain pairs is comparable to that of trained on human-annotated pairs.In the last section we gave some suggestions on spelling correction testing activities. We also raised some problem needed for further research.

Keywords/Search Tags:

Spelling correction, machine learning, distributional similarity, Ranking SVM, query log chain

PDF Full Text Request

Related items

1	Research On Query Error Correction Based On Transfer Learning
2	Erron: A phrase-based machine translation approach to customized spelling correction
3	Chinese Spelling Correction Research In Search Engines Based On Statistical Model
4	Research Of Web Database Approximate Query Based On Semantic Similarity Computing
5	Research On Chinese Spelling Correction In Question And Answer System
6	Optimization And Implementation Of Chinese Spelling Error Detection And Correction Algorithm
7	Search Engine Error Correction Algorithm And Error Correction Bad Case Mining
8	Studies On Key Technologies Of Flexible Query For Web Databases
9	Research On Chinese Spelling Check Technology Based On Machine Learning
10	Personalized Search Framework With Joint Learning Of Document Ranking And Query Suggestion