Font Size: a A A

Chinese Question Classification Based On Deep Learning

Posted on:2022-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:Q X YuanFull Text:PDF
GTID:2518306494471484Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,various data and information on the Internet are increasing exponentially,which brings challenges to the existing retrieval system.How to retrieve accurate answers from the complex data and information has attracted more and more attention.As an advanced form of retrieval system,Q&A system can analyze and retrieve massive text data on the Internet and return concise and accurate answers to users,which has become a new research hotspot.The accuracy of question classification can directly affect the quality of question answering system,which is the basic task of question answering system.The current question classification task mainly faces the following problems: on the one hand,the number of questions in the data set is small,and the semantic feature information is insufficient;On the other hand,the data to be classified is mostly in the form of natural language questions,which is colloquial and fuzzy,and the classification effect of the trained model is not good;In addition,the existing research results of question classification mostly focus on English question classification,and the accuracy of Chinese question classification rarely reaches a considerable level,which still has great research space.Based on this,this paper mainly conducts research from the following two aspects:Aiming at the problem that it is difficult to express deep semantic information in the word embedding stage because the question text is limited to a small number of words and the semantic feature information is insufficient,the related text representation methods are studied,and a text information expansion representation mechanism based on translation module is proposed.This text representation structure uses Google Translation API to achieve the purpose of text translation expansion,and adopts excellent pre-training models ERNIE and BERT to represent Chinese corpus and translation corpus respectively.The classification accuracy of this method is higher than that of traditional word2 vec method,Ernie or Bert method alone.Aiming at the problems of difficult training and poor classification effect in current classification methods,this paper uses Chinese question classification method based on feature fusion to classify questions.On the one hand,by adding highway network "control gate" structure to convolutional neural network and bidirectional long-term memory model,the problem of low classification accuracy caused by difficulty in feature extraction caused by deep model level is alleviated;On the other hand,adding DCU(Dactical Composite Units)structure reduces the consumption of space and time during runtime.Compared with baseline model,the effectiveness of this method is verified.On the basis of exploring the model,the prototype of Chinese question classification system is built,and the method proposed in this paper is applied to a practical system.Automatic classification of Chinese questions is realized through the Chinese question classification function module of the system.
Keywords/Search Tags:chinese question classification, word embedding, highway network, feature fusion
PDF Full Text Request
Related items