Font Size: a A A

Research And Implementation Of Entity Relation Extraction Algorithm In News Field Based On Distant Supervision And Seouence Labeling

Posted on:2022-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:C C PengFull Text:PDF
GTID:2518306338970049Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology and the advent of the era of big data,the scale of text data is growing explosively.News data usually contains rich useful information,but it is difficult for users to obtain valuable in-depth information efficiently.Information extraction can filter out useless redundant information from unstructured text and retain high-value,structured and available data.Relation extraction is an important sub task of information extraction.Relation extraction can extract the relation between entities from unstructured text.At the same time,the results of relation extraction can also be applied to the construction of social network,knowledge graph and other downstream tasks.In order to mine the potential relationship between entities contained in news,this thesis designs a relation extraction algorithm based on distant supervision and an open domain relation extraction algorithm based on sequence labeling,and uses the proposed algorithms to build a visual prototype system of relation extraction.This thesis mainly includes the following three aspects:1)This thesis proposes a distant supervision relation extraction algorithm BGSGA,which combines the attention mechanism of syntactic graph and sequence information.The algorithm fuses the context sequence information and syntactic information of the word to obtain the deep representation of the sentence.It can also capture the complementary semantic information and syntactic structure information.The algorithm designs a syntactic graph attention mechanism,which updates the word representation with the representation of syntactic neighborhood words,and obtains the syntactic structure importance information of words in the syntactic dependency graph.The results of the experiments show that BGSGA outperforms several other baseline algorithms in benchmark datasets.2)This thesis proposes an open domain relation extraction algorithm STDP based on sequence labeling and dependency parsing.The algorithm designs a sentence reorganization strategy,which splits and reorganizes complex sentences with coordinate relationship.The algorithm simplifies the sentence structure and reduces the difficulty of labeling the relation indicating phrase without changing the sentence semantics to the maximum extent.At the same time,the algorithm has the ability to capture overlapping triplets.The algorithm uses the idea of sequence labeling to label the relation indicating phrase in the sentence,and designs a post-processing strategy to get the complete entity relation triplet.The experimental results show that the STDP algorithm performs better than other baseline algorithms.3)This thesis constructs a relation extraction visualization prototype system.The core part of this system is BGSGA and STDP,the algorithms we proposed.The system uses news data to mine the entities and relationships related to the core person to meet the query need of users.The user inputs the person to be queried in the front end of the system,and the back end returns the entity and relationship related to the person by querying the neo4j graph database,and displays the returned results in the form of graph and table.
Keywords/Search Tags:relation extraction, distant supervision, graph neural network, dependency parsing, sequence labeling
PDF Full Text Request
Related items