Research On Key Technologies Of Chinese Question And Answer System

Posted on:2021-01-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Shu

Full Text:PDF

GTID:2428330614469068

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid popularization of the Internet,information search has become an urgent demand.Although the search engine represented by Baidu and Sohu has brought great convenience,users often cannot find the required content on relevant pages returned by keywords search engines.In order to satisfy the demand for faster and more accurate access to information,Especially for professional information,the Chinese Question Answering System(QAS)as a new information retrieval method,has become a research hotspot in the industry.Chinese Question Answering System comprehensively uses natural language processing technology to allow users to ask questions in natural language and then return accurate answers.Its working efficiency depends on the performance of the main components such as the word segmentation system,part-of-speech tagging,dependency syntax analysis,and related semantic calculation.In order to improve the performance of the existing question answering system,this paper aims at improving the shortcomings of the existing Chinese word segmentation system based on neural network,the part-of-speech tagging model and algorithm based on hidden Markov model and Viterbi algorithm,and the semantic computing model based on the feature of word frequency distribution.The main work of this paper is as follows:(1)In this paper,a new Ensemble learning segmentation algorithm is proposed.Accoring to the shortcomings of the existing segmentation algorithm for the discovery of new words in a single corpus,integrating neural network,mutual information and branch entropy is used for word segmentation.The word segmentation results of neural network are modified by using mutual information and branch entropy to effectively identify new words.Experiments show that the new segmentation algorithm can effectively improve the accuracy of word segmentation.(2)A part-of-speech tagging algorithm based on an optimization probability model is proposed to simplify the parameter estimation of the HMM model into a system optimization problem described by multivariate functions.The optimal parameters of HMM model parameters are estimated by an improved genetic algorithm.The improved HMM model combined with Viterbi algorithm is used for part-of-speech tagging.Experiments show that the algorithm can achieve part-of-speech tagging more accurately.(3)A Semantic Computing model based on Term Frequency(TF),Inverse Document Frequence(IDF),Information Gain(IG),Latent Dirichlet Allocation(LDA)is proposed,namely TF-IDF-IG-LDA.Gensim is used to calculate the semantic RI(Relevant Information)of word vectors to obtain the semantic relevance degree.The classification documents are retrieved according to the similarity to get the text where the answer is.Experimental results show that the new model can effectively improve the accuracy of text classification compared with TF-IDF and TF-IDF-IG.

Keywords/Search Tags:

Ensemble learning, Dynamic Viterbi, HMM, Part of Speech Tagging, Semantic Computing

PDF Full Text Request

Related items

1	Statistical Based Mongolian Part-of-Speech Tagging Study And Realization
2	Research On Lao Language Part-of-speech Tagging With Multiple Features
3	Research On Laodian Participle And Part-of-speech Tagging Method
4	Research On Part-of-Speech Tagging Algorithms Of Mathematical Corpus Based On Deep Learning
5	HMM-based Chinese Part-of-Speech Tagging And Improvement
6	Research And Implementation Of Modify Chinese Part-of-Speech Tagging Based On FST Technology
7	Study Of Kazak Part-of-Speech Tagging Based Upon HMM
8	Statistics-based Chinese Pos Tagging Method
9	Research On The Construction Method Of Burmese Part-of-speech Tagging Corpus
10	A Research On Lao Language Part-of-speech Tagging With Multi-feature Fusion