Font Size: a A A

Research On Segmented Semantic Annotation Method In A Weak Label Environment

Posted on:2021-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:L T AnFull Text:PDF
GTID:2518306047481284Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Traditional text classification algorithms require text data to have relatively complete and accurate labeled data.However,in reality,complete labeled data is difficult to obtain in large quantities.Therefore,this paper studies and validates a method based on domain ontology for semantic annotation and sentence group division.It solves the problem of weakly labeling text data and making it semantically applicable in applications such as information retrieval,information extraction,and automatic abstraction.Divided into relatively independent sentence groups,and make unstructured text data can present the problem of structure.Aiming at how to perform semantic annotation in a weakly labeled environment,this paper proposes a method for semantic annotation of text based on domain ontology.This method uses the initial concepts to automatically obtain a set of structured semantic concepts,and assigns semantic labels to paragraphs based on the obtained categories,entities,relationships,and the frequency,position and relationship of extended words in the text,and mines subtopic information of the text.Mark out key words that are mentioned but not mentioned in the text to facilitate more detailed paragraph division.The experimental results show that the accuracy,recall and F value of the method for text labeling in specific fields reach 93%,78% and 83%,respectively.The labeling effect can meet the actual application requirements,and is better than the existing training corpus Text annotation method.In order to better segment the content of semantic annotations into paragraphs,this paper uses the annotation results of the text resources that have been annotated with domain ontology to train sentence groups of natural paragraphs of text resources under the training of bidirectional LSTM neural network and attention mechanism.Dividing to a certain extent solves the problem of the lack of a systematic structure in the division of natural paragraphs.The data show that this method can effectively recognize the paragraph boundaries of sentence groups effectively and realize the automatic segmentation of sentence groups.
Keywords/Search Tags:Weak label, domain ontology, semantic annotation, sentence group division, attention mechanism
PDF Full Text Request
Related items