Font Size: a A A

Research On Topic Event Mining And Dynamic Evolution Analysis

Posted on:2017-01-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:F H LiFull Text:PDF
GTID:1108330503969633Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Topic event mining and evolution analysis are to present the interesting event in a structured form. The association relationship and development trend can be discovered by extracting, processing and analyzing the key information, such as time, place, and character, so that the followers could understand the event clearly and quickly. Topic event mining mainly includes time series analysis, information retrieval, automatic abstraction, topic detection and tracking, event detection, burst detection, anomaly detection and so on. The preliminary work of topic event mining is data acquisition which is to acquire the relevant data of events and process them into semi-structured or structured data.This thesis studies the topic event from sentence-level to document-level, and to multi-documents level. We mainly focus on the deep understanding of topic event, including its extraction and analysis within multiple documents. Topic event extraction includes sentence-oriented or phrase-oriented information identification(e.g. time, place, character and shallow semantic parsing), document-oriented information identification(e.g. time, key action, place and person) and multiple document-oriented information integration. Topic event analysis includes dynamic evolution analysis on sub-topic, impact analysis for the character, and anomaly detection. This thesis covers four key points of topic event mining. They are emphasized differently for different problems.(1) Information extraction and time series characteristic of topic event are studied. Sentence-oriented arguments cannot reflect the actual situation of topic event. This study focuses on topic event whose essential component is meta event with the meaning of action. Topic event extraction includes sentence-level, document-level and multidocument extraction. Time recognition model for topic event is proposed and developed, which converts sentence-oriented or phrase-oriented time recognition to documentoriented time recognition to identify the time of topic event segment. In this model, dynamic choosing strategy referring time is adopted to normalize the temporal expression. Generally, there is certain relationship between event element and semantic roles controlled by the verb. Therefore, this thesis combines event extraction and shallow semantic parsing to establish the relationship between event elements and semantic roles. The performance of time recognition for topic event segment which is just based on key words or static reference time choosing strategy is improved.(2) Impact analysis of the character based on momentum and stock price analysis indicator is studied. Event element and burst detection are combined to study the character’s impact in the overall stage of event development. Physical model is adopted to define and construct the dynamics of the character’s impact. In addition to arrival rate, social attribute of the character is also considered, which avoids the excessive occurrence frequency of stopword. Momentum features of the character’s impact are characterized and analyzed by the stock market analysis indicator. This thesis considers the synergistic effect of multiply MACD technical indicators, which avoids the situation that some indicator is high, however there is no burst. Event element and its effect in the event development are analyzed.(3) Dynamic incremental strategy in sub-topic evolution analysis for topic event is studied. Traditional topic detection and tracking automatically identifies new topic and dynamically tracks known topic in information stream of reports. These topics may be independent, which have nothing to do with each other, or describe different events. According to the characteristic that topic event may be regarded as dynamic data stream, a dynamic incremental model is presented for analyzing sub-topic dynamic evolution in the topic event, which borrows ideas of single-pass clustering, multi-category and dynamic incremental model. Threshold selection, similarity smoothing and time factor are analyzed based on the temporality and dynamics of the sub-topic.(4) Synergistic effect of statistical and fuzzy set-based theory on anomaly detection is studied. Anomaly detection is also a kind of time series analysis, which considers dynamics and temporality of data stream. Anomalies are the data which is greatly different from other data in a dataset. Some anomalies may be noise, nevertheless, some may be key information. For example, anomalies usually disclose key period or turning point in event development. The problems of unknown data distribution, control limit determination, multiple parameters, training data and fuzziness of ‘anomaly’ always exist in anomaly detection. To solve these problems, ‘anomaly’ and anomaly score are defined by statistical control theory. As ‘anomaly’ itself is a complex concept, fuzzy set theory and statistical method are studied for anomaly detection in topic event. This method doesn’t need any annotated data and is distribution-free. The parameters are confirmed by intensive fuzzification and optimization model.
Keywords/Search Tags:topic event, time series characteristic, impact analysis, sub-topic, anomaly detection, dynamic evolution
PDF Full Text Request
Related items