Font Size: a A A

Forest Pests And Disease Entity Recognition Based On Initial Clustering

Posted on:2015-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:L MaoFull Text:PDF
GTID:2268330431459462Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information extraction is a natural language processing technique which extracts structured information from unstructured text, including Named Entity Recognition (NER), Relation Extraction (RE), Entity Attribute Extraction (EAE), etc. As the basis of information extraction, NER is defined as the recognition of entities with specific meanings from the unstructured text by the machine, such as the names of the persons, locations and organizations.This article studied the identification of entities related to forest diseases and pests. First of all, a corpus containing all the entities of forest diseases and pests was established through searching and pre-processing the websites related to forest disease and pest control, followed by manual annotation.Secondly, a new entity recognition method based on initial cluster was presented to promote the convergence and generalization of the model as well as to save the labor of manual annotation. In the case of zero initial training set, based on the conditional random field model, we clustered the unannotated samples, selected the same quantity of samples as clusters and updated the training set by combining active learning and semi-supervised learning, which considered the distribution pattern of the dataset and avoided the effect of random data distribution.Thirdly, a query strategy combining N-best and RNN (Reverse Nearest Neighbors) was proposed based on the statistical model and corpus characteristics applied in the study of active learning method.Finally, the feasibility of the query strategy and the efficiency of clustering-based method were validated through the experiment.
Keywords/Search Tags:Named entity recognition, Text clustering, Disease and insect pest of forest entity, Activelearning, Semi-supervised learning
PDF Full Text Request
Related items