Font Size: a A A

Research On Event And Evolution Extraction Technologies For Microblogs

Posted on:2022-01-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:L MuFull Text:PDF
GTID:1488306314955269Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet has caused the explosive growth of Web in-formation.How to extract the valuable information that users need from the massive Web information,that is,Web information extraction has become one of the key issues that urgently need to be resolved in the Internet era.With the rapid development of so-cial networks,social media represented by microblog has generated massive amounts of real-time data.The real-time and social characteristics of microblogs enable events on microblogs to be quickly spread on the microblogging platform.Therefore,microblog has become an important way for users to obtain news and hot events,and it has also become a way for institutions and individuals to immediately release information.The main source of information.Microblog event extraction is of great significance to or-ganization and corporate decision-making.If the company can quickly detect the event in the early stage of the event and accurately determine the current development stage of the event,the company can take effective measures in advance to avoid the event's major impact on the development of the company.However,there are also some new challenges in microblog event extraction.These challenges can be summarized into three aspects:(1)Since a single microblogging has a small number of words and is freely posted by users,the event information contained is usually incomplete and noisy.How to integrate fragmented microblog data,eliminate the influence of noise in microblog,and build an effective representation framework for microblog events is a key issue.(2)Microblog events often have different evolution stages(or called life periods)such as occurrence,development,and extinction.How to accurately detect the evolution stage of the current event is a challenging problem.(3)Due to the different backgrounds of microblog users,there are also major differences in language expression.Therefore,different microblog users often have different expres-sions for the same entity.The alienation of these textual expressions makes it difficult to extract microblog events.The thesis focuses on the technical challenges of microblog events and evolutionary extraction,focusing on key issues such as microblog event extraction,microblog event evolution phase extraction,and microblog synonym recognition,and build a microblog event extraction prototype system,using crawled microblogs Blog data verifies the per-formance of the proposed algorithm.There is a large amount of event information on the microblog,but it is difficult for microblog users to obtain information about these events.The thesis provides users with a microblog event extraction tool through the re-search on the key technology of microblog event extraction.At the same time,according to the evolution characteristics of microblog events,it detects the evolution stage of mi-croblog events and solves the problem of microblogging by studying the recognition of synonyms in the microblog.The synonymous phenomenon in blogs provides users with information on microblog events and the evolution stage of events through related re-search on microblog events,thereby providing technology for microblog-oriented event extraction and microblog big data mining,and other related research references.In general,the main work and contributions of the thesis can be summarized as follows:(1.)Aiming at the problems of microblog information fragmentation and excessive noise information in microblog event extraction,a method of extracting trend words from microblog text is proposed.The extraction of trend words also considers the nov-elty,popularity,and influence of microblog keywords,which can effectively detect high-quality keywords related to events in the microblog text.On this basis,the the-sis uses the co-occurrence information of trend words in the microblog text to propose the concept of a microblog event information graph,and then constructs a two-layer microblog event information graph to represent the event information of the microblog text,and proposes A microblog event extraction algorithm framework based on graph division and subgraph detection is presented.We conducted experiments on real mi-croblog data sets,and the experimental results proved the effectiveness of the proposed algorithm framework.(2)Aiming at the problem of extracting the evolution stage of microblog events,the thesis first proposed the life cycle model of microblog events,which corresponds to the life cycle of the event.On this basis,the thesis constructs a Keyword Popularity Information Graph(KPIG)based on the statistical information and text information of microblog events to represent the microblog events and then proposes a method based on the graph kernel function to describe the changes in events Information,so as to re-alize the detection of the evolution stage of the microblog event.Compared with the existing methods,the KPIG diagram proposed in this thesis expresses the keyword and statistical information of the event through a graph model,which can capture richer event information;the proposed method of extracting the event evolution stage based on the graph kernel function changes through the KPIG diagram To extract the evolu-tionary stage,its ideas and methods are different from the existing work.We conducted comparative experiments on real data sets.The experimental results show that the mi-croblog event evolution phase extraction algorithm based on the KPIG graph and graph kernel function has good performance.(3)Aiming at the problem of many synonyms and lack of labels in the extraction of microblog events,the thesis proposes a microblog synonym recognition method based on self-supervised learning.This method assigns a pseudo-label to each microblog word through clustering and then uses convolutional neural network training to obtain the characterization vector of the word;iterating these two steps in turn until convergence.In the model training process,we selected the co-occurrence information and morpho-logical information of the synonyms as input features by analyzing the characteristics of the synonyms.We used various names of entity words as keywords to crawl microblog from Sina microblog to construct a data set for comparison experiments.The results show that the method proposed in the thesis is superior to the comparison algorithm in multiple indicators.(4)Based on the related algorithms of microblog event and evolution extraction,the thesis designed and completed a microblog event extraction and evolution analy-sis prototype system EventSys.EventSys provides a visual interface that can support microblog event extraction and event element extraction,support the detection of mi-croblog event evolution stage,and also provide the emotional evolution analysis func-tion of microblog events(analyze users' opinions on specific microblogs on the mi-croblogging platform).The emotional change trend of the event).EventSys provides an experimental platform for the experiment and analysis of microblog event extraction and analysis related algorithms,and can also support the verification of new algorithms in the future.
Keywords/Search Tags:Microblog Event, Event Evolution, Graph Model, Self-Supervised Learning, Lifecycle
PDF Full Text Request
Related items