Font Size: a A A

Research On Named Entity Recognition Methods For Unstructured Text

Posted on:2024-03-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z W YangFull Text:PDF
GTID:1528307064474324Subject:Computer software and theory
Abstract/Summary:
With the advent of the Internet 5G era,a large number of unstructured texts on the network provide users with information convenience,but also accompanies the problem of information redundancy and overload.Faced with the growing unstructured text data,such as free text without any format,information extraction technology based on deep learning has received widespread attention from the academic and industrial communities around the world.People hope to use automated processing and analysis technology for unstructured text to help obtain useful information and provide efficient decision-making support,thereby improving the service capability of artificial intelligence.Among them,named entity recognition(NER),i.e.entity extraction,is one of the most fundamental tasks in the field of information extraction,aiming to automatically extract key knowledge from text,such as important information like personal names,place names,and organization names.It is usually used in knowledge graph construction,knowledge management,information retrieval,text summarization,and other applications.However,compared with traditional structured and semistructured data,such as tables,unstructured text usually has the characteristics of irregular expression,complex content,high annotation cost,which greatly restricts the performance and application of named entity recognition.Therefore,how to effectively extract entities from unstructured text is of great significance for extracting structured knowledge from massive data and promoting downstream mining tasks.In recent years,there has been some research work on named entity recognition for unstructured text,but there remain three pressing issues: 1)How to solve the problem of incomplete extraction of implicit features,so as to integrate the multi-scale linguistic features of text mining into a unified entity recognition mode;2)How to solve the problem of the representation learning for nested entities to effectively identify all potential entities in unstructured text;3)How to alleviate the problem of labeled data scarcity to enhance the performance of entity extraction in low resource scenarios.Based on the above challenges,the main contributions of this thesis are as follows:First,a multi-scale attentive feature-enhanced NER method is proposed.Most text mining methods ignore multi-scale features of unstructured text,e.g.,character-level and word-level features,thereby performing poorly in named entity recognition.To solve the problem of incomplete extraction of context hidden features of unstructured text,this study proposes a multi-scale attentive feature-enhanced NER method.Specifically,the framework extracts word-level and character-level implicit features of the current context from different perspectives based on the attention mechanism,namely,global character features,local character features,global word features and local word features.On this basis,multi-scale features of other contexts are further integrated to enhance the representation learning of the current text,so as to comprehensively depict the entity features.Experimental results show that the proposed method can effectively improve the performance and robustness of entity extraction.Secondly,a hierarchical representation learning-based nested NER method is proposed.The nesting relationship between entities greatly impedes the representation learning of entities and restricts the performance of entity extraction.To solve the problem of the representation learning of nested entities in complex text,this study proposes a hierarchical representation learning-based nested NER method for this task.Specifically,our method aims to decompose the complex task of nested NER into multi-layer conventional NER problems,so as to reduce the task complexity.In order to learn the semantic nested features of complex texts,our method captures the dependencies of adjacent candidate entities through convolutional neural networks and uses attention mechanism to enhance the representation learning of candidate fragments.Finally,integrating the feature representation of these two stages to infer entity categories hierarchically,which greatly alleviates the problem of error propagation.The experimental results demonstrate that our proposed method is superior to state-of-theart algorithms on nested NER,and has significant advantages in modeling hidden nested dependencies and learning effective fine-grained entity representation.Thirdly,a self-training data-augmentation method for low-resource named entity recognition is proposed.In low-resource scenarios,data annotation is costly and the number of training samples is limited,which poses challenges to ensuring the performance of entity extraction.To address the problem of annotated data scarcity in low-resource domains,we introduce a self-training data-augmentation method for named entity recognition.Specifically,the method utilizes the entities in annotated examples to heuristically retrieve data related to such entities,and then generates weak labels for augmented data using a novel mining-verification iterative mechanism.In this mechanism,a recognition teacher is used to mine potential entities from non-entity text,and another prompt-based discrimination teacher is used to verify entity labels.Thus,the framework can iteratively refine weak labels in a divide-and-conquer manner and promote model training.Experimental results show that our proposed method performs well in low-resource named entity recognition,outperforming state-of-the-art baseline methods and effectively mitigating the problem of data scarcity in lowresource domains.Overall,our study delves into named entity recognition from the aspects of diverse implicit features,nested entity structure,and sparse annotation samples,and proposes a series of effective methods and techniques that have certain reference value for this task.In the future,we will continue to explore named entity recognition methods for unstructured text and further improve their practicality and universality.
Keywords/Search Tags:Unstructured Text, Named Entity Recognition, Information Extraction, Feature Selection, Representation Learning
Related items