Font Size: a A A

Research And Application Of Domain Oriented Entity Relationship Extraction Technology

Posted on:2024-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:H YuanFull Text:PDF
GTID:2558307079472244Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,data in specific fields is experiencing explosive growth,resulting in a large number of domain terms.Building a domain terminology knowledge map can effectively organize terms and form a transformation from information to knowledge.Entity relationship extraction is a key step in the construction of domain terminology knowledge atlas.The current rule based entity relationship extraction methods in general domain lack flexibility,and it is difficult to improve the generalization ability in the face of diverse expression of domain data;The extraction model combined with deep learning technology cannot adapt to the language pattern of domain data,and it is difficult to fully learn the semantic information contained in domain data.In response to the above issues,this topic mainly focuses on the following three parts of research.(1)A term vector self calibration method based on domain knowledge,KGSR,is proposed for unlabeled domain data.This framework innovatively utilizes the domain terminology knowledge map to correct and optimize the word embedding of the pre training language model into the domain data,solving the problem that the pre training language model is affected by the long tail distribution of the training data,and it is difficult to fully learn the semantic information and term association information of the domain terminology;At the same time,a cyclic iterative training method is proposed,which takes entity relationship extraction as the target task and continuously optimizes and adjusts the distribution of domain term vectors.Compared to common word vector generation models,the F1 value of the ultimately optimized word vector has significantly improved in entity relationship extraction tasks.(2)An entity relationship extraction model based on domain knowledge enhancement was constructed for small sample data with tags in the domain.Based on the existing entity relationship extraction model R-BERT,a new loss function is constructed through comparative learning,which injects external knowledge into domain term entities in text.This solves the problem of low sensitivity of the pre training language model to domain term entities and lack of prior knowledge,and improves the effectiveness of domain entity relationship extraction.(3)Design and implement a domain oriented terminology search and recommendation system.The system provides a visual display function of the domain terminology knowledge map constructed based on the improved KGSR and R-BERT models;Secondly,based on the KGSR domain term vector correction algorithm,the system abandons traditional character based recommendation or knowledge map path based recommendation,and implements semantic based search recommendation from the perspective of term entity semantics.Enhanced interpretability of terminology recommendations.In this thesis,we propose two methods to improve the learning ability of models to solve the problems arising from the application of entity relationship extraction technology in general domains to specific domains.Finally,the corresponding algorithm model is applied to a practical project,and a domain terminology knowledge map and domain terminology recommendation system are constructed to provide decision-making support for the construction of domain informatization.
Keywords/Search Tags:Entity Relationship Extraction, Domain Terms, Pre-training Language Model, Word Vector Generation, Relevance
PDF Full Text Request
Related items