Font Size: a A A

Research And Application Of Data Mining Algorithm Based On Graph Pattern

Posted on:2022-03-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:D Q TangFull Text:PDF
GTID:1488306728996529Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
Data mining is to extract knowledge fulfilling the needs of users and of scientific computing significance from large data.The data structure that users are interested in has various forms in different application fields,mainly including text,vector,table,graph and other data models.Currently,in the field of data mining,we generally use graph data structure to represent structured and semi-structured data because it has strong semantic expression ability.The major study of graph pattern data mining is frequent subgraph mining,a process in which users specify threshold to find subgraph patterns whose occurrence times exceed the threshold.In applications,we mainly solve the problem of frequent subgraph mining based on different user expectations and needs.The complexity of graph data structure,bringing many difficulties to mining graph data algorithm.Firstly,the problem of subgraph isomorphism,which is a NP complete problem.It is generally believed that there is no polynomial time algorithm unless P = NP.Secondly,any graph data mining method may generate duplicate candidate subgraphs,and it needs to repeatedly scan the database to calculate the support of candidate subgraphs,which takes a lot of time and space.In this paper,we optimize and improve the existing algorithms effectively in the coding,generating candidate sets and computing support of different graph datasets.This paper mainly includes the following content:(1)In this paper,we propose an algorithm for mining frequent subtrees from label data sets.Based on the semi-structured label dataset of forest,this paper first proposes the method of canonicalized representation of tree and forest,and uses extension to generate the candidate subtree sets.Then,it gives the mining algorithm of rooted ordered label tree,canonicalizes the free label tree,and gives the method of transforming free label tree into rooted ordered label tree.Finally,the experimental verification is carried out on semi-structured data sets.(2)We propose the maximal frequent subtree mining problem based on graph structured dataset.Firstly,give a tree canonicalization coding algorithm for graph data set is proposed.And,on the basis of frequent subtrees,all maximal candidate subtrees are generated through join and extension.The concept of embedded set is introduced to solve the problem of computing subtree isomorphism.On this basis,we propose a new maximal frequent subtree mining algorithm,and prove the correctness of the algorithm and the time complexity of the algorithm is analyzed.Finally,experiments based on real datasets are conducted,which the effectiveness is verified.(3)The problem of frequent subgraph mining is studied,and the reusing attribute is applied to frequent subgraph mining algorithm.Firstly,the canonicalization coding method of graph is gived,and the operations of join and extension are put forward to generate candidate subgraphs.We prove the reusing attribute can reduce the number of canonicalization tests of candidate subgraphs,thus greatly reducing the space size of redundant non-canonicalized candidate sets.Then,when calculating the new candidate subgraph,the reusing attribute of the test canonicalization adjacency matrix is used for direct judging,with no need to repeat the canonicalization test.On this basis,a frequent subgraph mining algorithm is proposed.Finally,we compared it with other algorithms based on real datasets.The results show that the proposed algorithm can reduce the number of canonicalization tests and redundant subsets,hence it has better performance.(4)Application of graph pattern data mining.The paper proposes a criminal gang mining algorithm based on graph pattern.Mining the core gangs and members in the criminal information data set of graph pattern can greatly improve the efficiency of crime network threat prediction and identification.Firstly,based on the massive criminal information network dataset,the data structure of candidate core gang set is constructed,and the operations of join and extension are proposed to obtain the candidate gang set.Then,the frequency of the gang set is calculated from different graph data by using the list of gangs,thus avoiding the isomorphism of subgraphs in traditional pattern mining.Finally,the simulation dataset and the real dataset contrast experiment show that the algorithm proposed in this paper is better than other mining algorithms.
Keywords/Search Tags:data mining, frequent subtree, maximal frequent subtree, frequent subgraph, mining gangs
PDF Full Text Request
Related items