Font Size: a A A

Mining Significant Trend Sequences In Dynamic Attributed Graphs

Posted on:2020-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:C ChengFull Text:PDF
GTID:2428330611998839Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of society and numerous applications of data collection devices,abundant data is being stored in databases.Knowledge Discovery from Databases is a process that can help understand the past and take decisions about the future.For this purpose,many data mining techniques have been designed to uncover interesting patterns in various types of data.Recently,a lot of attention has been given to finding patterns in graph data since it has many applications such as to analyze social networks,biological and chemical data.Though many algorithms were proposed to identify interesting patterns in graphs,most of them consider simple types of graphs such as static graphs or graphs having a single attribute per vertice.Recently,studies have considered discovering frequent patterns in dynamic attributed graphs.These graphs can represent not only relationships between entities but also how they evolve over time,and describe entities with multiple attributes.Algorithms for mining frequent patterns in dynamic attributed graphs select patterns based on their occurrence frequency.But a major drawback of this approach is that many frequent patterns may contain entities that are weakly correlated.Thus,numerous frequent but spurious patterns may be shown to the user.For applications related to causal analysis,it is desirable to filter out these spurious patterns to keep only the strongly correlated patterns.A user may do this filtering by relying on background domain knowledge.However,filtering patterns by postprocessing can be time consuming and suitable domain knowledge is not always available.To allow discovering strongly correlated patterns in dynamic attributed graphs,we propose a novel significance measure named Sequence Virtual Growth Rate.We develop this measure by drawing inspiration from the growth rate measure used in emerging pattern mining and the concept of spatio-temporal correlation in spatial data mining.Based on this measure,we define a novel type of graph patterns,called Significant Trend Sequence.Finding these patterns requires to set two frequency thresholds to reduce noise,and a significance threshold to ensure that each pattern is strongly correlated.The main difference between this study and related work on spatio-temporal data mining is that in the former,each component of a sequence can contain multiple events or attribute variations.For this reason,the search space grows very quickly when the number of eventsis increased.In this dissertation,we give a detailed analysis of the search space,and how the three parameters influence its size,and their effect on search space pruning.Then,we design two pruning strategies.One is an upper bound filtering technique,and another is an optimization for depth-first or breadth-first search.To efficiently mine significant trend sequences,we develop two algorithms named TSeq Minerd f s-b f sand TSeq Minerd f s-d f s.They rely on the two pruning techniques to reduce the search space.Moreover,we give an analysis of the complexity of these algorithms.Finally,to evaluate the performance of the algorithms,we conduct a quantitative study.Results show that the pruning techniques are effective and that the algorithms are efficient.Moreover,an analysis of patterns found confirm that insightful patterns are discovered in real data.
Keywords/Search Tags:graph mining, dynamic attributed graph, significance measure, sequence
PDF Full Text Request
Related items