Font Size: a A A

Research On Short Question Classification Based On Automatic Question And Answering

Posted on:2016-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:A J WuFull Text:PDF
GTID:2308330461469484Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid advance of science and technology and the continuous development of the network, the information grow fastly and explosively. How to quickly and accurately obtain the useful information from the vast amounts of information gradually evolved into a more and more important thing. As a special search engine, Automatic Question Answering System gets back to the research focus. It is different from the traditional search engine. And it understands the users’colloquial expression and gets the right answer back to the users. Automatic question answering system is mainly divided into three steps, question classification stands the first place among these three steps, it provides semantic restrictions and constraints for the continued steps question understand and the answers extraction.From the view of information theory, the information in the data can be quantified. If the increase of information reducts the event’s uncertainty, it is associated with the event; otherwise, it is irrelevant with the event. In fact, question classification is generally used by means of text classification. Compared to a text, a short question has less information. The category is determined by the information in short questions. As a total, it has many challenges for question classification including following aspects:one is the large dimension of feature vector space model and the low correlation features in the vector space; the other is the sparse feature vector space that is formed by the relatively short questions.In order to overcome the above two problems, this thesis focuses on the semantic of words, and builds the knowledge data base which supports the semantic. It also uses the deep learning to learn feature. A kind of feature learning method and question classification method both base on semantic information are implemented. The specific studies are as follows:First of all, this thesis calculates the words semantic relevance using mutual information and encyclopedia information. This method takes Wikipedia entry as nodes in the graph, the words’clustering results obtain through the links between them. This method uses the mutual information to calculate Wikipedia entry semantic relevance. And the experiment builds semantic relevance knowledge base with several of the largest results.Secondly, this thesis analyzes the common way in feature selection. It implements a method to obtain the semantic categories knowledge base in the specific corpus. It uses the semantic relevance knowledge base to expand the words in the questions semantically. To generalize the words in the questions semantically, it makes full use of the semantic categories knowledge base.Again, this thesis implements a kind of feature learning algorithms based on deep learning. Due to take words as features, the dimension is high. In order to reduce the dimension of vector space, the semantic expansion and semantic generalization method are used to deal with the questions. According to these two methods, the feature dimension numbers will become to less than 5000, so the follwing steps are feature learning and classification.Finally, this thesis implements a method based on semantic information for question classification. It does comparative experiments on the dataset among different processes, such as:among different feature selection methods、different feature dimension methods and different classification methods. At last, it finds a suitable method for the short question classification.The experimental data set is mobile voice data and it verifies the validity of the methods. The results show that the semantic knowledge base provides semantic support. The method which bases on semantic question classification solves these problems which are high dimension space and the low correlation between features and the sparse vectors. The methods which are feature learning and classification using softmax function are feasible, and the experiment gets good results at last.
Keywords/Search Tags:Question Classification, Automatic Question Answering, Deep Learning, The Semantic Knowledge Base, Feature Selection, Semantic Expansion
PDF Full Text Request
Related items