Font Size: a A A

Research On Entity Relation Extraction In Web Contents

Posted on:2021-06-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z G LiuFull Text:PDF
GTID:1488306092453734Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Relation Extraction is an important task of natural language processing and an important way of knowledge acquisition.It has great research value for analyzing and processing natural language and understanding world knowledge.In the age of big data,the information in the network shows explosive growth,and most of the information is stored and disseminated in the network with the carrier of natural language texts.In recent years,the rise of social network media such as Weibo,Weixin and Facebook has not only changed the form of information dissemination on traditional network,but also changed people's habits of living and social communication.What's more,the data scale of social platforms is getting larger and larger,the scope of communication is getting wider and wider,and the frequency of communication is getting higher and higher,which poses a huge challenge to network supervision.If a large number of illegal information,such as reactionary,yellow,violence and fraud,are allowed to fill the network,it will seriously affect the network order and endanger social stability and prosperity.In a word,the research on entity relation extraction of network text has great research value for the task of obtaining knowledge from network information and establishing knowledge map,and has far-reaching significance for supervising the dissemination of network information and ensuring the order of network space.Weibo data are taken as the main research object and the method of extracting entity relations from social network media is researched in this dissertation.In view of the strong interaction,short sentences and non-standard text,and the phenomenon of cross-sentence entity pairs in Weibo data,this dissertation studies from the lexical level and sentence level.The main research contents and innovations include:1.An entity relation extraction method in single-sentence based on the shortest dependency path and bidirectional LSTM is proposed.In this dissertation,entity relation extraction is regarded as a classification problem.The shortest dependency path between two entities is used to describe the characteristics of a single sentence and to obtain the association between entities.Word2 vec is used to embed lexical information,location information and color tags.For each sentence component in natural language is constrained by both its left and right components,a bidirectional LSTM model is taken in.On the basis of classical LSTM,forward forget gate and input gate,backward forget gate and input gate,and output gate are set up to receive signals from both sides.To extract relations,a Softmax classifier is employed.In order to solve the problem of over-fitting,Dropout strategy is used to randomly set the output of LSTM nodes to zero without blocking the signal transmission in these nodes.The extraction method in single sentence is mainly applied to non-interactive scenarios.If the entity pair exists in a single sentence,it can be considered that the entity relation is only affected by the characteristics of the sentence and is independent of the interactive scenario.In this case,the single statement extraction method can avoid introducing too much information to generate noise signals,and can effectively improve the computing performance.2.A multi-sentence entity relation extraction method which based on cross-sentence dependency path is proposed.In view of the strong interaction and the phenomenon of cross-sentence entity pairs in Weibo information,the dependency path between adjacent sentences is established from the dependency tree forest,and the characteristics of a conversation are represented by the dependency path between sentences.The LSTM network of sentence-word level is used to extract lexical features and sentence features respectively,to perceive the coherence of each sentence and to judge the reliability of the conversation.Then,entity relations are extracted by a piecewise convolution neural network.Multi-sentence entity relation extraction method is mainly used in interactive scenarios.The coherence and credibility of conversations can be perceived and the cross-sentence entity relations can get extracted according to the characteristics of context sentences.3.A Weibo interpersonal relation extraction method based on conversation completion strategy is proposed.It is obvious that the two sides that interact through Weibo usually have specific interpersonal relation.Aiming at the problem that the names or IDs of both sides is usually omitted in the Weibo conversation,the sender and receiver of Weibo information are marked as named entities and added to the sequence as sentences by conversation completion strategy.This dissertation regards interpersonal relation as a subclass of entity relation,and takes extracting interpersonal relation as a sub-task of extracting entity relation.The multi-sentence entity relation extraction method is employed to identify interpersonal relations.The Conversation-based interpersonal relation extraction method can extract the names of both sides from the interactive conversational scenarios and identify the possible interpersonal relation.At the same time,the method can also be applied to interpersonal relation extraction in non-interactive scenarios.4.A method of building knowledge base from Baidu Baike Encyclopedia is proposed,and distant supervised relation extraction for Weibo is realized.The acquisition,clean and annotation of weibo data generate huge labor costs.In order to reduce the dependence of the model on training data,this dissertation takes Baidu Baike Encyclopedia as an external knowledge base to carry out distant supervised entity relation extraction.This dissertation obtains information from Baidu entries and screens them,then integrates synonymous relations to establish a relation knowledge base for distant supervision.A sentence-word level attention mechanism is proposed to solve the relevance of conversational sentences.Sentence-level attention reflects the relevance of each sentence and reliability of the conversation,while word-level attention senses the prompting words to identify the type of relation in Weibo text.The two-level attention distant supervision can perceive the reliability of conversation and the contribution of words to relation classification,and it can train model parameters with the help of external knowledge base in the absence of training data.What's more,it reduces the dependence on training data and improves the practicability of the model.This dissertation mainly carries out experiments on Sina Weibo dataset to extract entity and interpersonal relation.The experimental results show that the models and methods proposed in this dissertation have good ability for recognizing the entity and interpersonal relation in Weibo information.The accuracy,recall rate and F1 value are significantly higher than those of other models.At the same time,the improved baseline model has achieved remarkable results.The performance of the improved model has been greatly improved,which is close to the performance of relation extraction in traditional text.
Keywords/Search Tags:Social networks, Interaction scenario, Entity relation extraction, Interpersonal relation extraction, Distant supervision
PDF Full Text Request
Related items