Font Size: a A A

Research And Implementation Of XML Document Publish/Subscribe System Based On Nondeterministic Finite Automaton

Posted on:2011-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:H K ZhangFull Text:PDF
GTID:2248330395957914Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the Web’s technology and applications, XML has become an important component of information and data exchange on the World Wide Web. XML is used in a wide number of applications such as, e-commerce, electronic data exchange, scientific data representation, data modeling and analysis, and search engine. Publish/subscribe system is asynchronous and multi-point communicated, and information’s publisher and subscriber are completely decoupled in space, time and control-flow. Publish/subscribe system is able to satisfy the needs of large-scale and highly dynamic distributed computing environment based on Internet. So far, there is a large amount of XML document data accumulated on the Web and the XML document data is updating very fast. Facing to the users’different needs, publish/subscribe system can send XML documents to the users who are interested to the XML documents immediately.The prefix-shared idea of Y-Filter can improve the efficiency of query processing, but the quantity of the states in the stack would have a number of exponential growths with the depth of the stack in the process of structure matching of nondeterministic finite automaton. For this reason, this thesis makes a further study on the optimization problem of the quantity of the states in the stack in the process of path structure matching. According to the characteristics of XML documents and queries, this thesis designs a stack optimization algorithm based on DFS, and uses backtrack and jump operation to reduce the unnecessary element matching operations. To analyze the performance of the stack optimization algorithm based on DFS, this thesis designs the corresponding experiments, and analyzes the system’s performance for the different parameters. Experimental results show that the stack optimization algorithm based on DFS can reduce the quantity of the states in the stack in the process of structure matching effectively, and the response time is acceptable.In addition, the structure matching and predicate matching of queries are not relevant in time, so this thesis uses the technology of assembly lines in the XML document publish/subscribe system, and according to the differences of structure between the linear query and the twig query, this thesis uses assembly lines containing different segments to process them. The use of assembly lines improves the processing efficiency of the XML document publish/subscribe system. This thesis designs relevant experiments for linear queries and twig queries, and the experiment results show that the matching algorithms of linear queries and twig queries based on assembly lines technology can save time by25%and50%respectively.
Keywords/Search Tags:Nondeterministic Finite Automaton, assembly lines technology, structurematching, predicate matching
PDF Full Text Request
Related items