Font Size: a A A

The Research Of Question Analysis Based On Ontology And Architecture Design For Question Answering System In Agriculture

Posted on:2014-02-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:D P HuFull Text:PDF
GTID:1268330401478558Subject:Information Technology and Digital Agriculture
Abstract/Summary:PDF Full Text Request
In last two decades, the telecommunication network has spread into countryside, and somepeasants have surfed the Internet with personal computers, which have come up in China. How toaccommodate the special interests of users for agriculture information, and how to accurately propagatethe agriculture technology information, have become a challenge and critical problems for informationtechnology in agriculture.Question Answering System (QA) is a hierarchical, comprehensive system, whose researchbranches refer to Artificial Intelligence (AI), Information Retrieval (IR), Information Extract (IE), andNational Language Processing (NLP). The approach of applying QA to satisfy requirement of users inagriculture by retrieval, extract, and mining information form Internet is a feasible solution. This thesis’main research focused on the key problems of QA. The main works in this paper are as follows:1. At first, this paper introduced the foundation concepts about NLP, IR, IE, and ontology et al. andgave an outline of development process of NLP, IR, IE, and ontology et al. Then, on the basis theresearches of QA system, this paper analyzed the logical structure of QA based free text, which focusedon the research methods and the basic framework of QA. The development of agricultural informationtechnology with Chinese characteristics was briefly introduced, including the application of QA systemin agriculture.2. This part proposed a novel semi-supervised method for domain ontology relation learning. Thekey problem was how to enrich the relations between concepts. On the base of text information analysis,this paper proposed a method for extracting ontology relation with mutual information algorithm.3. The semantic analysis over a question is the key to catch the user’s requirement. In this thesis, inorder to descript the relationship between concepts, this paper proposed concept-feature for thepresentation of domain-specific concepts. A novel algorithm based on hidden Markov model forextracting concept-feature words was proposed, analyzed the key to the learning of the module structureand method of parameter estimation. In the processing, the algorithm makes full use of the formatinformation of list separators and special-labels to segment text, and gains extraction information ofspecial-fields, based on hidden Markov model.4. IR was one main part of QA. The researches of this thesis mainly focus on the informationretrieval model. The ontology-based information retrieval model was introduced, which based on thecomputing equivalent classes of individuals of ontology. ontology was generated using a kind of basicdescription logic, which was a suitable tradeoff between expressivity of knowledge and complexity ofreasoning problems.5. Answering extraction is the key problem of QA. This thesis proposed an answer extractionalgorithm based Latent Dirichlet Allocation (LDA). The main methods as follows:Firstly, the topic-word and document-topic distributions were inference by Gibbs algorithm, andthrough which built LDA model for text. Secondly, Text segmentations were built based on LDA models corpora and texts. Clarity is taken as a metric for similarity of blocks and segmentation pointsare identified by local minimum. Thirdly, the topic words of segments are extracted according toShannon information. Words which are not distinctly in the analyzed text can be included to express thetopics with the help of word clustering of background and topic words association. The significationbehind the words are attempted to be digged out. Last, the similarity between questions and paragraphsare calculated, and take the highest similarity paragraph for the answer.6. The architecture of QA system was described in detail, which was built on Hadoop and HBase.The principle and the application method of open source distributed file system-Hadoop, and theNon-Relational database-HBase were introduced in this thesis. The method develops QA system basedon Hadoop and HBase was proposed. The function of each part of the QA system was presented andintegrated performance analysis of QA system was given in this part.7. The experimental methods and data models for QA system were designed, which include theanalisis of evaluation criteria. At first, the results of experiments for extracting concept-feature wordsand question classification were analyzed. Then recall of ontology information retrieval experimentswere described and compared with the keywords method. Last, the accuracy rate of answer extractionbased on LDA model was analzed, which mainly for the agriculture-based question calssfication. Theexperimental results demonstrate the methods proposed in this paper could enhance the performance ofQuestion Answering system in agriculture.
Keywords/Search Tags:agriculture information, domain ontology, Information Extract, Question Answering system(QA)
PDF Full Text Request
Related items