Font Size: a A A

The Method Of New Words Discovery And Answers Ranking In Finance Question Answering

Posted on:2018-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2428330566998573Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the in-depth study and extensive application of automatic QA system,it begins to show trend of localization and specialization.In contrast to general open automatic QA system,there are many new words in specific field which don't exist in word segmentation machine's lexicon.The word segmentation machine with incomplete part of original word list may cut new word into multiple parts,so these new words may affect word vector training when using general word segmentation machine to segment these sentences in specific field.Questions in specific fields are often mixed with structural data type and free-text type,but different types of classification have different answer extraction methods.In selection of candidate answers,previous QA system is often based on vector distance to sort candidate answers and retrieval in vector space,but this method only considers distance between word and word,ignoring the weight of each word in sentence and the deviation of calculation due to difference in part of speech.In view of these difficulties of QA system in above problems,this paper puts forward a method of new words discovery and answers ranking in finance automatic QA,which is from the views of statistics and combining with various machine learning algorithms such as convolution neural network.The main contents of this paper include:Discovery and extraction of new words in financial field.In order to better calculate correlation between user's question and candidate question-answer pairs,this paper comes up with a method to extract new words from large-scale field corpus which is from view of the statistical point,combining with use of independent word probability and information entropy.In this method,it can improve the accuracy of extracting new words more than 90%,and the new words it extracts could raise MRR value by 0.03 or more in answers ranking experiments.Candidate answers ranking and extracted.This paper presents a method which combines relevance of user's question and candidate questions,as well as relevance of candidate answers to sort the candidate answers.The former is based on calculation in questions' relevance of word vector in vector space,and the latter is based on the convolution neural network model in relevance of question and candidate answer.In this paper,it is proved that cosine similarity calculation based on sentence length is the best,and sentence's keywords and its speech weight extracted by TF-IDF also have great influence on the correlation calculation.It is concluded from the experiment that correlation accuracy between user' question and candidate question-answers pairs can be improved to a certain extent by calculating results of the two correlations values in this QA system model.Construction of QA corpus in financial field and QA system platform.This paper constructs a variety of knowledge database on financial field,and builds a question answering robot system which integrates common daily QA,financial structured data of timing data,free financial question,and auxiliary customer service,The QA system in 200 test questions has good performance that Top1 value reaches more than 105 and MRR(Mean reciprocal rank)value reaches to 0.63.
Keywords/Search Tags:finance question answering, new words discovery, answers ranking, convolution neural network
PDF Full Text Request
Related items