Research On Query Error Correction Based On Transfer Learning

Posted on:2020-04-06

Degree:Master

Type:Thesis

Country:China

Candidate:T X Ji

Full Text:PDF

GTID:2428330575967957

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Search engines have become an indispensable part of people's lives.Improving search engine efficiency and enabling users to find the information they want is an important task.However,if there are errors in the query word entered by the user,it is obviously difficult to obtain a satisfactory result.In search engines,because there is a high probability of finding content in unfamiliar areas,input errors are more likely to occur than other text input.Many traditional error correction models have some drawbacks.For example,it is impossible to uniformly model various error types,and often correct errors for different errors,and combine them in a pipeline manner.This way often leads to accumulation of errors.Correction is good when there is only one type of error,but multiple errors are not ideal.Many data-driven learning algorithms rely too much on annotation data.There are few data sets in the query error correction fieldIn response to the above problems,this paper mainly uses a model to uniformly handle multiple errors.And we use the unlabeled data for transfer learning,which can effectively improve the effect of the model.Finally,a method for automatically generating annotation corpus is proposed.This paper mainly includes the following two aspects:1.A character-based end-to-end model is proposed,which mainly uses the Seq2Seq model based on the attention mechanism,and incorporates the neural network language model trained on unlabeled corpus.The end-to-end model can uniformly model different types of errors.Experiments show that the model can achieve good results on the public data set.The attention mechanism and the neural network language model can effectively improve the performance of the error correction model.2.For the field of Chinese query error correction,this paper proposes a method of automatic generation of training corpus based on the combination of rules and automatic speech recognition.We simulate users' errors based on users' habits and pronunciation characteristics.A variety of error categories have been generated.Experiments show that the method proposed in this paper can also achieve good results in Chinese.Meanwhile,experiments have shown that higher data quality can also improve the performance of the model.

Keywords/Search Tags:

query spelling correction, transfer learning, Seq2Seq, automatic corpus generation

PDF Full Text Request

Related items

1	Applications Of Spelling Correction Techniques In Information Retrieval And Text Processing
2	Deep Learning Based Automatic Grammer Error Correction
3	Universal Text Generation Transfer Learning Based On Advanced Semantics
4	Improved Attentional Seq2seq With Maxout Fusion Layer For Dialogue Generation
5	Research And Implementation Of Automatic Abstract Generation System Based On Deep Learning
6	Chinese Spelling Correction Research In Search Engines Based On Statistical Model
7	Research On Automatic Signature Generation Algorithms For Polymorphic Worms Based On Transfer Learning
8	Research On Chinese Spelling Correction In Question And Answer System
9	Research On Automatic Generation Technology Of SQL Query Statement
10	Optimization And Implementation Of Chinese Spelling Error Detection And Correction Algorithm