Font Size: a A A

Design And Implementation Of Domain-oriented Intention Recognition System

Posted on:2021-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z H LiFull Text:PDF
GTID:2518306047486324Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The human-machine dialogue system mainly consists of five parts,which are automatically speech recognition(ASR),spoken language understanding(SLU),dialog management [1],dialogue generation [2] and text to speech(TTS).Among these five parts,SLU play an essential role in the human-machine dialogue system.SLU is mainly composed of domain classification,intent classification and slot filling module.In particularly,intent classification is one of the most difficult and crucial problems of the human-machine dialogue system.Nowadays,here are more than one method to study intent classification,the main research methods are intent classification based on rules,intent classification based on traditional machine learning,intent classification based on deep learning and intent classification based on fusion-based deep learning.Because it is impossible to develop an ideal module of intent classification which suits all situations,this develops an intent classification system that only applied in human-computer dialogue filed.What we design for realizing an user-intent classification system is,firstly,transferring complicated intentclassification problems into general multi-classification problems,after that,optimizing the user-intent classification module by combining with short text from human-computer dialogue and verb characteristics from user intent.In this way,we expect to the user-intent classification system developed in this project have an excellent performance in intent recognitionThis project studies intent classification from four different algorithms,which are Support Vector Machine(SVM),Fast Text algorithm,Convolutional Neural Network(CNN)and Recurrent Neural Network(RNN).Besides,intent classification system in web version would be built in the project in order to easy testing intent classification and showing results.The raw data used in the project are downloaded from the official data sets of Snip,which is an open source voice platform.The whole raw data are generated from human-computer dialogue and has been divided into 7 categories according to intent.The data preprocessing mainly includes removing punctuation marks,special symbols,splitting words,lowercasing all letters,extracting key words from sentences,reducing word form and reducing dimension by adapting feature selection algorithm based on Shannon entropy.In this project,we use traditional machine learning method,neural network method and deep learning method respectively to build user-intent classification module and compare their performance.Here we introduce the workflow of this project.For traditional machine learning method,we use SVM algorithm,which is one of the traditional machine learning methods,to build user-intent classification module.And the input of this module is that the text inverse text matrix transferred from raw text datasets.What we should notice in here is that the verb and the object of text should be weighted according to intent analysis in order to highlighting intent information.For neural network method,we use Fast Text,which is an ultra-fast text classification algorithm created by Facebook's AI Research,to build userintent classification module.The Fast Text use a simple three-layer neural network module to train intent classification module.The input of module is the text characteristics extracted by Bags of Words(BOW)and N-gram algorithm.Comparing with other modules,the module created by Fast Text only needs to train one set of word vector weight matrix and show a better performance than deep learning algorithm.For deep learning method,this project use CNN algorithm and RNN algorithm respectively to build user-intent classification module.Firstly,in the module created by CNN,we use pre-trained word vectors of large data sets as input and combine it with characteristics of short text in order to adjusting parameters,such as convolution kernel size,convolution kernel numbers and learning rate.After that,we use Cross Entropy Error Function(CEEF)and Stochastic Gradient Descent(SGD)algorithm to optimize module and finally gain a user-intent classification module with excellent overall performance.Secondly,what we do in the module created by RNN,we test the classification accuracy and compare it with that of CNN.It turns out the accuracy only have slightly increasement(0.5%),indicating that the userintent classifications created by RNN and CNN has similar performance.The reason of it is that the proportion of time-series features in RNN approximately equal to the proportion of features convolution kernel extracted in CNN.
Keywords/Search Tags:Intent classification, short text dataset, support vector machine, convolutional neural network
PDF Full Text Request
Related items