Font Size: a A A

Research And Design Of Intelligent Question Answering System In Restricted Domain

Posted on:2019-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:J GaoFull Text:PDF
GTID:2428330563490356Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The traditional search engines have many defects in the process of serving the vast number of Internet users,including the tedious return of information and poor quality of answers,and so on.Especially in the state of sharp growth of Internet data,the accuracy of the answers to users can not be guaranteed.How to understand the user's intention accurately,return the answer quickly and succinctly,reduce the user's search time and improve the accuracy of the answer,is a problem to be studied in depth.The paper chooses the medical field as a specific field,and studies the intelligent question answering system and its main parts in the restricted domain,in the question classification,the algorithm of TFIDF feature weight calculation based on inter class and intra class distribution is proposed to improve the accuracy of question classification,in the question retrieval,the question retrieval model based on the similarity calculation of LDA topic model was designed to improve the precision of question retrieval,and the intelligent question answering prototype system is realized in the medical field.The main contents of this paper are as follows:(1)Data acquisition and preprocessing.We use reptilian technique to get a collection of health care questions in the Sina IAsk website and the related data in the 39 Health Network,then build a domain dictionary through analyzing data.With the help of the medical dictionary,the NLPIR participle technology is used to get the result of the question word segmentation and the part of speech tagging.Using stoplist removed segmentation results of particles,interjections,get the final word document collection,so as to realize the data preprocessing.(2)The feature weight calculation and classification of questions.First,the feature words are extended according to the synonym forest,and then the method of calculating the feature weight is studied,aiming at the shortcomings of traditional TFIDF ignoring the relationship between words and categories,the concept of mutual information and information entropy is introduced,and a TFIDF feature weight algorithm based on intra class and intra class distribution are proposed.Finally,the question classification system is determined according to the actual situation,and apply the weight of feature words as training set,particle swarm optimization algorithm is used to optimize the parameters of SVM classifier,and the test set is applied to the classification model to realize question classification.Comparison experiments and results analysis of traditional feature weight calculation methods and improved methods.(3)Query model based on similarity calculation of LDA theme model.This paper focuses on the LDA theme model,designs a query model based on the similarity calculation of the LDA theme model,uses Gibbs sampling to estimate the parameters,and analyzes the comparison experiment results with the LSA and VSM models.(4)Design and implement an intelligent question answering prototype system in medical field.The functions of question classification,information retrieval and answer display are realized,and the system interface is displayed.
Keywords/Search Tags:intelligent question answering system, calculation of feature weight, question classification, question retrieval, LDA theme model
PDF Full Text Request
Related items