Font Size: a A A

Research And Implementation Of Chinese Entity Relationship Extraction Based On Deep Learning

Posted on:2024-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y PanFull Text:PDF
GTID:2568307112976579Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the advent of the Internet era and the information age,more and more data gradually exists in various forms in the Internet.How to extract valuable information from the huge amount of data in the Internet has also gradually become a current research hotspot,which is not only an important goal of information extraction.In the field of information extraction,named entity recognition and relationship extraction are two of the most fundamental tasks,which are important for a number of downstream tasks such as building knowledge graphs,machine translation and automatic question and answer.As a result,these two tasks are widely used in the fields of natural language processing and artificial intelligence.The main focus of this paper is on named entity recognition and relation extraction,aiming to improve the performance of the model by incorporating deep learning and other algorithms.Firstly,in the named entity recognition module,we propose a LatticeLST M-self-attention model based on a self-attentive mechanism.The model first us es the LatticeLSTM network to encode all legitimate input character sequences and all words that can be matched with the lexicon,and not only embeds the character information in the LatticeLSTM network but also embeds the word sequence information by building a lexicon.This paper demonstrates the effecti veness of the LatticeLSTM model in the field of Chinese named entity recogni tion through experiments.However,as the LatticeLSTM network is still a varia nt of the long short-term memory network,there is the problem of long-distan ce dependency.This paper combines the self-attentive mechanism to distinguish the importance of different words by assigning different weights to each word to alleviate the problem of long-distance dependency to a certain extent.Due to the lack of public datasets in the disciplinary domain,this paper constructs a disciplinary domain dataset by collecting a large scale disciplinary corpus an d combines self-training and data augmentation methods to build a disciplinary domain dataset and conducts extensive experiments and analyses on the discip linary domain dataset to fully demonstrate the effectiveness of the LatticeLSTM model incorporating the self-attentiveness mechanism in the field of Chinese n amed entity recognition.In the relationship extraction module,the paper uses the BERT-BIGRU model on top of BERT and a pipeline approach to link the two tasks of named entity recognition and relationship extraction to achieve information extraction,identifying the knowledge meta-entities in the text through the named entity recognition module and then classifying the identified entities through the relationship extraction module.This paper demonstrates the effectiveness of the BERT-BIGRU model in Chinese relationship extraction by conducting extensive comparative experiments on the produced knowledge metadata set.Finally,by extracting knowledge meta-entity relationship triples from massive unstructured text through these two modules and using the Neo4j graph database as the storage medium to construct disciplinary knowledge graphs,this paper designs and develops a knowledge visualisation platform to which functions such as knowledge point query and knowledge base management are added.
Keywords/Search Tags:Information extraction, Named entity recognition, Self-attention mechanism, Knowledge graph
PDF Full Text Request
Related items