Font Size: a A A

The Filtering Of XML Data Based On Automata

Posted on:2011-08-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:J ShenFull Text:PDF
GTID:1118330368983007Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of financial portfolio managing and web usage log in Internet, more and more data types, particularly semi-structured data, are needed for information representantion. XML is a special representation of semi-structured data, which is a new standard for Web data representation and interaction. More and more Web data store and interactive in XML format.For any database, query filter is an important part, so the research of filtering in XML data has very important significance. In this thesis, based on the automata theory and method, the filtering algorithms of XML in the local and network are deeply studied.Firstly, paper studies the cache miss problem in the process of XML filtering, and proposed a two-stage assessment method to quantitative analyze cache miss in the process of XML filtering. Base on the DFA, the process of XML filtering is analyzed, which aims at evaluating the number of cache miss, especially the compulsory miss and capacity miss. In a filter cycle by constructing a tag tree to statistic the number of compulsory miss in the process of filtering query. In the first i-filtering cycle, consider reuse the filtering results in the prior i-1 cycle. Through the intersection, union and difference operation, estimate the first i-cycle of mandatory filtering query cache invalidation and capacity failure number.Throug the intersection, difference and union operation in work sets, estimate the number of compulsory miss and capacity miss in the first i-cycle of filtering query. Finally, the experiment proves that the estimation mathod has compared high accuracy.Secondly, base on the LazyDFA filtering system, the concept of frequent access area is introducted, and propose a DFA-FA filtering mechanism. By reducing the cache failure in the process of accessing, the performance of filtering is improved. Thesis firstly define the concept of frequent access area, and set a limitation for the size of frequent access area, then consider the best access frequency threshold for the choice of nodes in the frequent access area. The experiment proves that the optimized filtering mechanism has better performance.Thirdly, for the XML data stream in P2P network, a distributed NFA filtering mechanism is proposed, which is used to address the interaction XPath filter between the peer nodes. NFA is added in the Chord ring, and then a distributed NFA is constructed incrementally based on the local YFilter system. Two methods are proposed to execute the distributed NFA, one is based on iteration, and the other is base on recursive. The experiment proves that the recursive method has better performance. Final experiments are done for different load, when changing the number of querying and the size of network, the results show that the distributed NFA also has good performance, and has good load balance for store load and filtering load.Lastly, an integrated XPush filtering mechanism is proposed base on the XPush automata, to address the dynamic update problem in the query process of XM1 data stream. Thesis defines the data representation, integrated state table and integrated state transition table in integrated XPush automaton, and also define an integrated key for integrated state table and key for sub-XPush automaton. Integrated XPush automata is dynamically updated, the process of update in two ways. One is separation process, which separate a sub-XPush automaton from the integrated XPush automaton, equal to deletion a query request. The other is adding a new querying process, which handle integrated XPush automaton with new adding sub-XPush automaton.Through experiments comparing with the original XPush automata, the integrated XPush automaton is influenced less than the XPush automaton when the query requests changing.This thesis is mainly carried on the XML data filtering based on the automaton. First using the cache access technology optimize the local XML filtering. With the study of distributed hash table in P2P network, the XML data filtering in P2P nodes and optimization filtering of dynamic condition are analyzed. By comparison with the experimental of the original method, improved methods are more usefulness and effectiveness.The research on filtering mechanism in P2P network and dynamical filtering provides a good theoretical basis and ideas for further research.
Keywords/Search Tags:Automaton, XML data stream, Cache miss, Frequent access node, Distributed hash table, Dynamically
PDF Full Text Request
Related items