Font Size: a A A

Research On Query Error Correction Based On Transfer Learning

Posted on:2020-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:T X JiFull Text:PDF
GTID:2428330575967957Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Search engines have become an indispensable part of people's lives.Improving search engine efficiency and enabling users to find the information they want is an important task.However,if there are errors in the query word entered by the user,it is obviously difficult to obtain a satisfactory result.In search engines,because there is a high probability of finding content in unfamiliar areas,input errors are more likely to occur than other text input.Many traditional error correction models have some drawbacks.For example,it is impossible to uniformly model various error types,and often correct errors for different errors,and combine them in a pipeline manner.This way often leads to accumulation of errors.Correction is good when there is only one type of error,but multiple errors are not ideal.Many data-driven learning algorithms rely too much on annotation data.There are few data sets in the query error correction fieldIn response to the above problems,this paper mainly uses a model to uniformly handle multiple errors.And we use the unlabeled data for transfer learning,which can effectively improve the effect of the model.Finally,a method for automatically generating annotation corpus is proposed.This paper mainly includes the following two aspects:1.A character-based end-to-end model is proposed,which mainly uses the Seq2Seq model based on the attention mechanism,and incorporates the neural network language model trained on unlabeled corpus.The end-to-end model can uniformly model different types of errors.Experiments show that the model can achieve good results on the public data set.The attention mechanism and the neural network language model can effectively improve the performance of the error correction model.2.For the field of Chinese query error correction,this paper proposes a method of automatic generation of training corpus based on the combination of rules and automatic speech recognition.We simulate users' errors based on users' habits and pronunciation characteristics.A variety of error categories have been generated.Experiments show that the method proposed in this paper can also achieve good results in Chinese.Meanwhile,experiments have shown that higher data quality can also improve the performance of the model.
Keywords/Search Tags:query spelling correction, transfer learning, Seq2Seq, automatic corpus generation
PDF Full Text Request
Related items