Font Size: a A A

Research On Key Technologies Of Chinese Question And Answer System

Posted on:2021-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ShuFull Text:PDF
GTID:2428330614469068Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid popularization of the Internet,information search has become an urgent demand.Although the search engine represented by Baidu and Sohu has brought great convenience,users often cannot find the required content on relevant pages returned by keywords search engines.In order to satisfy the demand for faster and more accurate access to information,Especially for professional information,the Chinese Question Answering System(QAS)as a new information retrieval method,has become a research hotspot in the industry.Chinese Question Answering System comprehensively uses natural language processing technology to allow users to ask questions in natural language and then return accurate answers.Its working efficiency depends on the performance of the main components such as the word segmentation system,part-of-speech tagging,dependency syntax analysis,and related semantic calculation.In order to improve the performance of the existing question answering system,this paper aims at improving the shortcomings of the existing Chinese word segmentation system based on neural network,the part-of-speech tagging model and algorithm based on hidden Markov model and Viterbi algorithm,and the semantic computing model based on the feature of word frequency distribution.The main work of this paper is as follows:(1)In this paper,a new Ensemble learning segmentation algorithm is proposed.Accoring to the shortcomings of the existing segmentation algorithm for the discovery of new words in a single corpus,integrating neural network,mutual information and branch entropy is used for word segmentation.The word segmentation results of neural network are modified by using mutual information and branch entropy to effectively identify new words.Experiments show that the new segmentation algorithm can effectively improve the accuracy of word segmentation.(2)A part-of-speech tagging algorithm based on an optimization probability model is proposed to simplify the parameter estimation of the HMM model into a system optimization problem described by multivariate functions.The optimal parameters of HMM model parameters are estimated by an improved genetic algorithm.The improved HMM model combined with Viterbi algorithm is used for part-of-speech tagging.Experiments show that the algorithm can achieve part-of-speech tagging more accurately.(3)A Semantic Computing model based on Term Frequency(TF),Inverse Document Frequence(IDF),Information Gain(IG),Latent Dirichlet Allocation(LDA)is proposed,namely TF-IDF-IG-LDA.Gensim is used to calculate the semantic RI(Relevant Information)of word vectors to obtain the semantic relevance degree.The classification documents are retrieved according to the similarity to get the text where the answer is.Experimental results show that the new model can effectively improve the accuracy of text classification compared with TF-IDF and TF-IDF-IG.
Keywords/Search Tags:Ensemble learning, Dynamic Viterbi, HMM, Part of Speech Tagging, Semantic Computing
PDF Full Text Request
Related items