Font Size: a A A

Entity Relation Extraction For Free Text

Posted on:2022-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:D ZhangFull Text:PDF
GTID:2518306350495494Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of Internet technology,the Internet's text information is increasing exponentially.How to quickly and accurately obtain valuable information from many unstructured texts becomes the focus of researchers.Organizing various types of information from heterogeneous and multi-source data has become a common challenge in the information field,and information extraction has emerged as a critical technology to solve such problems.Information extraction mainly includes named entity recognition and relation extraction.Information extraction mainly deals with unstructured free text.The free text has a loose structure's characteristics and changeable content,making it challenging to recognize.Information extraction aims to filter and recognize free text to meet users' needs for high-quality information.This thesis' s main issues include named entity recognition technology,relation extraction technology.In the task of named entity recognition,entity boundaries and word segmentation information are crucial determinants that affect the accuracy of Chinese named entity recognition.Word-based methods rely heavily on segmentation tools.These methods can cause error propagation problems from word segmentation incorrectly.Character-based methods lack word-level information and word boundaries information,which will cause the problem of missing information.A novel neural entity recognition model fused with self-matched lexical word features is proposed for Chinese NER tasks in response to these problems.The embedding layer used the attention mechanism to extract the semantic features of word sequences,using the statistical character frequency to extract the syntactic features.The F1 score of90.76% and 95.58% was obtained on the MSRA and Resume datasets.The state-of-the-art experimental results are obtained on the Resume dataset.The F1 score of 73.96% and 57.65% was obtained on the Onto Notes 4 and Weibo datasets.As a fundamental task in natural language processing,the entity pair in the sentence is an essential factor affecting relation extraction performance.Using entity features directly in the model will cause missing information,and the same entity will produce different semantics in different contexts.In response to these problems,this thesis proposes a relation extraction model based on the sememe knowledge for relation extraction tasks.The entities' relative position information and absolute position information enhance the head and tail entities' position feature representation ability in the input layer.Sememe knowledge is used to represent the entities' multiple semantics to enhance the entities' feature representation.The self-attention mechanism is used to extract context information,and the representation of character vectors is enhanced.In the output layer,the attention mechanism is used to extract multiple integrated features,and the Softmax function is used for encoding.The new method has shown significant improvements in the Sem Eval-2010 Task 8 relation extraction dataset,and the F1 value of 84.00% was obtained on this dataset.
Keywords/Search Tags:Information Extraction, Named Entity Recognition, Relation Extraction, Attention Mechanism, Sememe
PDF Full Text Request
Related items