Font Size: a A A

Spoken Language Understanding Research Based On Conditional Random Fields

Posted on:2017-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:L H ChengFull Text:PDF
GTID:2308330503484339Subject:Engineering, information and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, man-machine dialogue system research has become one of the popular projects. Spoken language understanding is the key technology to realize the man-machine dialogue system, thus the performance of the spoken language understanding plays an important role in the development of man-machine dialogue system. The recognition rate of automatic speech recognition technology has obtained great improvement in recent years led by depth acoustic model of neural network, but the process of automatic speech recognition is still likely to identify the mistakes, and the spoken language is also often inconformity to the rules of grammar. This paper proposed a method to improve the robustness of spoken language understanding based on conditional random field model. The main work and innovation points including:1. The performance of natural language understanding is often degraded by undesirability speech recognition errors and ill-formed inputs in spoken language. A new method for robust spoken language understanding based on conditional random fields is proposed. Erroneous texts are artificially added in the training data to train the model parameters. Experiments are carried out in the domain of a traffic information query system. Experimental results show the proposed method can improve the robustness of spoken language understanding. Significant precision,recall and F1-score improvements can be obtained compared with the model trained on clean spoken text database.2. The relationship between the word and the word vector of each sentence has certain influence on the performance of the spoken language understanding. In view of this, a improving spoken language understanding based on word embedded method is proposed. Sentence vectorizations firstly got by word2 vec, the vectors of all the words are got. By calculating the similarity between the vectors, the similarity between each word is calculated, and then through the clustering, an initial fuzzyclassification is got. Then the initial fuzzy classification as a feature, or alone or combined with other features, into the spoken language understanding training of CRFs, the final classification results are obtained.Finally, in the field of Chinese traffic query of experiments have been carried out to verify the proposed method in this paper. Experimental results show that the method in performance better than the existing method based on rules, and near by the other new method of data driven, but greatly reduce development costs.
Keywords/Search Tags:The man-machine dialogue system, Conditions Random fields, Spoken Language Understanding, Word embedding
PDF Full Text Request
Related items