Font Size: a A A

Research On Calculation Method Of Tree Structure Similarity Based On Minimal Hash Algorithm

Posted on:2022-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2518306554971129Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The advent of Web2.0 era not only brings massive data to the Internet and computer science,but also brings new data types.Tree structure is one of these newly emerging data types,which plays an important role in data composition in computer science,linguistics,biology,graphics and other fields.In these fields,users prefer to measure the similarity between different tree structure data to accomplish some specific retrieval,matching and other tasks.However,there are still many challenges in tree structure similarity calculation.Firstly,the large amount of data leads to the serious time consuming of calculation.Second,the algorithm operation occupies a large space.third,the calculation result error is large.Although many algorithms have been proposed to solve the problems of the above tree structure,such as tree editing distance algorithm,tree kernels algorithm and algorithm using data characteristics,etc.,and these algorithms also play a certain role in solving the problem,there are still some other problems.For the unordered tree structure,the tree editing distance algorithm is NP-hard problem,but other algorithms have the problems of large error,time-consuming calculation,and large space occupation.For the ordered tree structure,the time complexity of the tree editing distance algorithm is proved to be only cubic time,and other algorithms are also difficult to capture the structural information,leading to low computational accuracy.In view of the above problems,this paper mainly does the following two parts of research work:(1)For unordered tree structure data,a structure-preserving subpath signature algorithm based on minimum hash algorithm is proposed.The innovation points of this study are as follows: 1)A structure-preserving subpath signature algorithm is proposed,which can transform the tree structure data into a structure-preserving subpath signature,and then use it to calculate the similarity.At the same time,its structure preserving property is proved by theory.2)Using the minimum hash algorithm,it can better solve the problem of computation time caused by large amount of data and reduce the space occupied by the operation.(2)For ordered tree data,a multi-dimensional sequence algorithm based on subpath signature algorithm is proposed.In this study,the innovation points include: 1)combining the multi-dimensional sequence framework with the subpath signature algorithm to realize the fast calculation of sequence similarity.2)The spatial information of the tree structure is used by the dynamic time warping algorithm to make the calculated similarity result more accurate.
Keywords/Search Tags:Tree structure, Subpath signature, Minimum hashing algorithm, Dynamic time warping algorithm
PDF Full Text Request
Related items