Font Size: a A A

Research On Question Classification And Similarity Calculation In Agricultural Question Answering

Posted on:2019-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:S GaoFull Text:PDF
GTID:2428330542999212Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
At present,huge agricultural data has been accumulated on the Internet.How to effectively use these agricultural big data information has become an urgent problem to be solved.The agriculture question answering system is an initiative that can analyze and understand the problems encountered by agricultural users,then find similar problems in the system corpora,and finally return the answers needed by the agricultural users.Compared with the existing way of searching information directly on the Internet,the question answering system is more suitable for using agricultural big data to help farmers obtain knowledge about crop production.This article focuses on the question and answer data of the agricultural community website on the Internet,and makes some research on the problem classification and similarity calculation in the question answering system.(1)In terms of problem classification,the paper mainly analyzes some drawbacks of the existing feature selection methods in the classification process,and its shortcomings in the face of unbalanced data.Then combining the characteristics of questions in the agricultural field,it proposes a Based on the mutual information feature selection method based on the probability distribution among classes,the frequency of each word in each class can not only solve the problem of unbalanced data distribution among classes,but also reduce the weight of those words with low occurrence frequency.The paper mentions the probability distributions among the three categories,which are the variance,range,and the difference between the maximum value and the next largest value.The comparative experiment is used to select the most suitable distribution among the three types of word frequency distribution,and then combine it with the mutual information as a new feature selection method.(2)In the similarity calculation,the paper mainly proposes a similarity calculation method based on the word2vec-LSI model based on the existing problems of some algorithms.In this method,the feature words in the text are clustered before calculating the similarity,and the central word of each feature word class is calculated.Then the original text is converted into a word-question matrix using the calculated center words.The elements in the matrix are The TF-IDF value of the center word of the corresponding position finally converts the word in the question into a vector and superposes the mean value,and then connects the word-document matrix end to end as a new text representation.After constructing a new text representation,the LSI model was used to reduce the dimensions of the matrix and extract topics,and then similarity calculations were performed.The experimental results show that the accuracy of the problem-based classification and similarity calculation of the agricultural question answering system is improved by the improvement of the feature selection method and similarity calculation in the problem classification,which also proves that the proposed method is effective.
Keywords/Search Tags:Question answering system, question classification, similarity calculation, word vector model, LSI model
PDF Full Text Request
Related items