Font Size: a A A

A Study Of Error Correction In Domain Oriented Dialogue Text After ASR Conversion

Posted on:2020-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:J Q ZhangFull Text:PDF
GTID:2428330590961160Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of the information age,the information generated by people every day has exploded exponentially.This information contains valuable data to be explored.For example,traditional industry companies such as banks and insurance receive a large number of customer service calls every day,and the company urgently needs to analyze the quality of the conversation and mine the user's intentions from these massive dialogue data.However,before analyzing these data,the first difficulty encountered is that most of these dialogue data are converted into text through speech recognition.In the process of speech conversion,due to noise,user accent and other interference,the text convert from speech recognition contains many errors.These errors reduce the analyzability of the text.Therefore,using natural language processing methods combined with the characteristics of dialogue itself to correct these texts,on the one hand,can improve the correct rate of dialogue speech conversion,on the other hand,it is beneficial to further analysis of text data and maximize mining the value in the data.Although text correction has been studied for a long time,most of them are for normative texts(newspapers,books and periodicals)in the open field,and the correction of texts with high degree of colloquialism is rarely studied,and it is a big challenge.In terms of error detection,this paper proposes to use the combined N-gram model with the Bi-LSTM language model to evaluate sentences and improve the accuracy of error detection.After the error point is located,error correction is required.In terms of error correction,this paper proposes a method for generating candidate sets by multiple strategies.Different methods are used to generate candidate sets for different error types.The first is to generate a candidate set based on the domain ontology knowledge base and the Pinyin string prefix tree.Because the recognition of proper nouns usually generates more errors,using the Pinyin prefix tree can be used to quickly find the corresponding candidate vocabulary.After obtaining the candidate set generated by the ontology knowledge base,it is necessary to combine the ontology knowledge base and the context of the dialogue to calculate the support degree of each candidate word,and finally obtain TOP-1 as the error correction scheme.The second strategy is to combine the domain linguistic knowledge base,query the collocation of words,and combine the feature such as pinyin similarity to get the candidate set,try to correct the text error,if the linguistic knowledge base of the domain fails to produce a valid candidate set,then query from generic domain linguistic knowledge base to generate candidate set.Finally,the original words are replaced by each candidate word in the set,the probability of the whole sentence is calculated by Bi-LSTM language model by each candidate word,and TOP-1 is taken as the correction scheme.In the construction of the error correction knowledge base,this paper proposes to construct the domain linguistic knowledge base adaptively with the dependency syntax,so that the algorithm can self-learn the domain words and collocations in the new domain.In the end,this paper designs and implements a set of error correction framework for the finance domain dialogue text,which can correct the wrong dialogue in a finance domain and has good domain migration ability.
Keywords/Search Tags:Text correction, Language model, N-gram, Bi-LSTM, Processing after ASR conversion, Language knowledge base construction
PDF Full Text Request
Related items