Font Size: a A A

Research On Web Entity Trace Discovery For Market Intelligence

Posted on:2017-04-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y HuangFull Text:PDF
GTID:1108330485979564Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The rapid development of Internet technology has led to exponential growth of the number of web page, the large amount of which includes all sorts of information greatly valuable. Nowadays, it is urgently needed to take advantage of the data to conduct data analysis and mining so as to obtain useful information to help with the decision making to the greatest extent in terms of the fields of application like market intelligence analysis. The information on the web, always, describes what happened (i.e. event) on what existed objectively (i.e. entity) in the reality. The events, however, recording the dynamic changes of entities are spread all over on the Web, isolated, discrete and disordered, which makes the users can only obtain fragments of the development and changes of the entities in a broken way, rather than the relationships between events and entities before and in their development, the complete development trace (entity trace) and the development pattern (the pattern of entity trace). It, in this way, can easily lead to overgeneralization and fatal errors in decision making. Therefore, the priority is to organize the relationships of entities and events found on the complex Web and to discover the Web entity trace and its pattern so as to predict the trend and help with decision making.The researches related to Web entity trace, currently, focus on linking the entities and events in chronological order. The method can only meet users’needs in browsing, and, as far as market intelligence is concerned, is not qualified for in-depth analysis and mining. Thus, the thesis focused on the research on Web entity trace in terms of market intelligence, aiming to identify the underlying relationships between events and discover the periodic trace of entities as well as the similar trace of entities of the same kind for trend prediction and decision making support.The following are the problems to be addressed in the research, oriented to market intelligence, on the discovery of the Web entity trace:1) The Web entity trace research needs to find out the underlying relationship between events to be discovered, which is not able to be obtained by the marks in the context. Therefore, effective feature extraction is needed to identify the relationship between the events.2) The research needs to discover the periodic trace of some entities, which are hid in and hard to be found in the large amount of relationships of events. Thus, effective structural definition and FP-growth algorithm are required.3) The research needs to discover the similar trace of entities of the same kind, which, however, will cause such problems as the large scale of events to be addressed and a lot of redundancy modes. Effective pretreatment and pattern-discovering methods, therefore, are needed.The thesis, oriented to market intelligence, aimed to effectively find out the underlying relationships between events and the periodic trace of entities to predict the trend and support the decision making, and conducts a research on the existing important problems. The major contributions of the thesis are addressed as follows:(1) Put forward a method based on the correlated features of the event elements to identify the relationship between events and solved the problem of identifying event relationships.Event relationships, especially the unmarked causal relationship, cannot be effectively identified on the basis of semantic correlation due to the fact that there is no cue word showing the causality between events. To solve the problem, the thesis came up with a method based on the correlated elements of the events to identify the relationship between the events. This method, firstly, identified the co-occurrence relationship between the events, including the co-occurrence relationship of cross-document events, providing a foundation for identifying the relationships of cross-document events. Besides, the next step was to analyze, through statistical method, the correlated feature of the co-occurrence events’ elements contributing to the causality was conducted, and the feature vector space of the event pair was built. Then, on the basis of training data sets, a machine learning-based classification pattern classified the event relationships into two types, that is, causal relationship and following relationship. In this way. we can identify the unmarked causal relationships of cross-sentence, cross-paragraph and even cross-document events in an effective way; furthermore, such causal relationships as one-cause and multiple-effect relationship and multiple-cause and one-effect relationship can also be identified effectively. Lastly, we could link the valuable events in accordance with the event relationships as needed to establish an event relationship graph, so as to provide sound statistical foundation for market intelligence analysis. In addition, the event causality graph is only a subgraph of the event relationship graph, through which, we could have a good visualization of one-cause and multiple-effect relationship and multiple-cause and one-effect relationship. Research results showed that the method had a higher level of precision in identifying event relationships and overcame the disadvantages of existing identification methods like low level of transferability and precision and difficulties in identifying cross-document events.(2) Put forward a method to discover the periodic entity trace based on frequent subgraph mining and effectively solved the problems like low pattern growth, explosion of pattern combination and the large amount of redundancy modes in the discovery.The periodic trace was depicted with graphs according to the semantic feature. It found out that, from the entity-event graph, the discovery of periodic trace faced common problems including low pattern growth, explosion of pattern combination and the large amount of redundancy modes. To solve these problems, the thesis put forward a discovery method for periodic entity trace on the basis of frequent subgraph mining. The method, firstly, would conduct event clustering according to semantic similarity of the events’activity. All the events in each category were tagged with the same label and the events in the event relationship graph were replaced with corresponding labels. Then, we were able to implement vertice-edge-vertice pattern mining in the event label graph, on the basis of which, Star pattern mining was conducted. At last, all the Star patterns were combined to the greatest extent and the results achieved were the periodic traces. The Star pattern, with its own structural features, enabled the increase of merge algorithm at the scale of Star pattern and quick convergence of the merge algorithm without additional redundancy modes, avoiding the explosion of pattern combination. The research results showed that the method had a higher level of precision and recall rate and solved the problem of low efficiency resulting from the pattern growth approach that only showed an increase of a point or an edge each time found in the frequent subgraph.(3) Put forward a method to discover the similar traces of the entities of the same kind related to important events and effectively solved the problems like dealing with the large scale of events and the large amount of redundancy modes occurring in the discovery.From the perspective of actual situation, similar traces were depicted in sequence diagrams. The problems such as the large amount of events to be dealt with and redundancy modes were found in the entity of the same kind and event relationship graph. To solve the problems, the thesis put forward a method to discover the similar traces of the entities of the same kind related to important events. The method, at first, started with the sequencing of all events related to each entity in chronological order, and then, through segmentation of time windows, found out important events and alternative topics. In this way, every entity could get a sequence of alternative topics. After that, all the sequences of alternative topics were clustered and the topics in the same category had a same label. Dynamic planning, furthermore, was adopted to find out the longest common alternative topic sequence in all sequences of alternative topics. At last, according to the entity-event relationship graphs and the minimum support threshold, all alternative topics in the longest common alternative topic sequence were expanded, the result of which was the similar trace. The method greatly reduced the number of events to be dealt with by screening important events. Dynamic planning adopted could discover the similar trace in embryo and stopped the expansion of unrelated topics so as to avoid redundancy modes. The research results showed that the method had a higher level of precision and call rate and solved the problems of low efficiency resulting from the emergence of large amount of redundancy modes due to the unpredicted expansion of frequent subgraphs.
Keywords/Search Tags:Event relation, Event relation Graph, Entity Trace, Periodic Trace, Similar Trace
PDF Full Text Request
Related items