Font Size: a A A

Character Relationship Classification And Graph Construction For New's Events

Posted on:2021-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:C XuFull Text:PDF
GTID:2518306497957149Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,the classification of character relationships in news texts on the Internet has gradually become one of the research directions of information extraction tasks.However,the studies on the relationship analysis of the characters for the news are very rarely,and many relationship judgments are limited to the same news,and does not combine the related news that describe the same event,which will affect the objectivity of the relationship classification results.Moreover,the current algorithms related to news clustering lack the consideration of news release time and key locations.In addition,the current model of entity extraction and entity relationship extraction based on deep learning is mainly for English texts,and does not take into account the difference in word segmentation between Chinese and English.Based on the background and problems mentioned above,this topic proposes a research on the relationship classification of new's characters based on deep learning models.The content of the research is mainly divided into the following parts:(1)News clustering: In order toanalyze the relationship between the characters in Chinese news texts more comprehensively,this topic proposes a multi-Chinese language model(N-gram)and word frequency-inverse document frequency algorithm(TF-IDF)multi-feature news clustering algorithm.First,segment the news text and remove stop words,then use the N-gram and TD-IDF algorithms to construct the representation vector of the news text,and obtain the similarity of the news text according to the cosine similarity algorithm.Then the keyword matching degree of time and place is comprehensively scored to obtain the weight of each feature.Finally,using the single-pass traversal clustering method to cluster the news text.(2)The construction of Entity recognition and entity relationship extraction model:In view of the difference in word segmentation between English and Chinese,this topic proposes a method of integrating word granularity input,so that the overall model can use the potential vocabulary information in Chinese text.Because the structure of Long Short Term Memory(LSTM)is relatively complex,it is very time-consuming when the training set is too large.Gated Recurrent Unit(GRU)units are used instead of LSTM to shorten the running speed of the model.Finally,the improved model of fused word granularity input is compared with the model of single granularity input to prove the effectiveness of the improved fused word granularity.(3)Character relationship classification and graph construction: First using the improved news clustering algorithm to cluster news,and then using the improved entity recognition and entity relationship extraction model to perform character entity identification and character entity relationship extraction for each news cluster.The results are stored in relational database and graph database.Finally,the relationship graph platform is constructed by separating the front end and the back end,and the identified character entities and the character relationship network are dynamically displayed and managed.
Keywords/Search Tags:News clustering, N-gram, Bidirectional GRU, Character recognition, Relationship graph
PDF Full Text Request
Related items