Font Size: a A A

Rearch On XML Pattern Matching Based On Data Stream Environment

Posted on:2017-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:C ChenFull Text:PDF
GTID:2348330503995783Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the arrival of the “Internet +”era, more and more data are starting to be published online,exchanged and integrated with the network. Web has become the main information source of the human society and the portal of the media and business. As an Internet standard language on Web which cross product, interface and platform, XML has shown its strong application prospects. Query and retrieval efficiency of XML document reflects the processing efficiency and user-expericnce of the application whose data core is XML document. Existing algorithms are able to support the query requirements of the common twig pattern and some other complex forms, but with the rapid growth of XML data capacity and complexity, there is still a potential for improvement in memory consumption and time efficiency. Therefore, based on the data stream, this paper makes a deep research on the correlation matching algorithm of XML document.Firstly, a new algorithm of twig pattern matching based on data stream is proposed which is named TwigInStream. This algorithm can get the local region encoding of the elements through the process of parsing the XML document. The matching result can get by executing the sorted list without other data structures. Theoretical analysis and experiment results show that the proposed algorithm has better time efficiency and a certain advantage in dealing with the P-C relationship in the process of twig pattern matching. Secondly, in view of low processing efficiency, large memory consumption and lots of useless intermediate result when the existing algorithm process the wildcard matching, we forward a new matching algorithm named WTwigList which support the wildcard querying. It uses the local Extended Dewey encoding and can reduce the number of the single path involved in the final matching through the leaf node filtering process. It also set a new data structure that represents the hierarchical information of the wildcard node, which can reduce the wildcard number and limit the scope of the matching element through the matching procedure. The matching result can be got from the sorted list matching operation. Finally, a large number of experiments did on real and synthetic data sets,compared with the existing calssical algorithm in terms of the number of paths, time efficiency and memory consumption. The experimental results show that the WTwigList algorithm has better performance.
Keywords/Search Tags:XML, labeling scheme, twig pattern, data flow model, wildcard matching, sorted list
PDF Full Text Request
Related items