Font Size: a A A

The Research On XML Flitering Model Baesd On Hash Table And Stream Index

Posted on:2013-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:W J XiaFull Text:PDF
GTID:2248330374983082Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Since XML (eXtensible Markup Language) was born in1998, it has become the standard of Internet data exchange format. A large number of related applications, such as message notification systems, personal personalized information and so on, need to filter the information. How to filter XML data in high efficiency has become one of the hot research issues in recent years.In recent years, in XML filtering fields, a series of achievements have been made. Among them, many theoretical models and tools have been very mature. For example, XPath, automaton theory have formed a variety of mechanisms. After XFilter has been proposed, a series of automaton theories, such as YFilter and lazy DFA, have been widely applied to XML filtering. Also, a variety of automaton models have been studied out, which are used as the mainly used technique in XML filtering now.In XML filtering, the most important thing is to consider how to reduce the cost, how to improve the efficiency, in order to maximize the benefits. To achieve this goal, reducing the process of the invalid elements and reducing the parsing of the amount of the document’s elments, become an important way. This is also the main content of this thesis. Reducing the process of invalid elements, including two aspects, the first aspect is how to quickly judge that certain elements are invalid elements, the second aspect is how to deal with these invalid elements. In this thesis, we introduce hash table to judge which are the invalid elements, and we introduce stream index to skip the invalid elements, in this way, we improve the efficiency of the filtering.Hash table stores the location information of the elements, which can judge the hierarchical relationship of any two elements in the document quickly, in order to determine whether the elements are invalid elements. In the filtering, the process of ancestry-posterity relationship "//" often requires time-consuming, because if any elements can be matched to the relationship, then no matter what depth of the elements in the document are, can meet the filtering conditions. And the process of judging the elements, need to put the elements into the stack, that will cost a lot of time and space. In this thesis, in the filtering, by introducing the hash table, when meeting the ancestry-posterity relationship "//", we do not need to put the elements of the document into the stack fist, but to look up the hash table to see whether the filtering elements meet the filtering conditions, if met, put them into stack to continue the filtering, if not, skip them by the stream index.Stream index marks the start position and end position of the element, when meeting an invalid element, it can skip it and its descendants by the finish tag, in this way it avoids the process of invalid elements, which will improve the analytical efficiency. The experiments show that when the depth of the input document is big and there are a lot of ancestry-posterity relationships in the document, the methods proposed in this thesis are superior to traditional XML filtering methods in efficiencv.
Keywords/Search Tags:XML, filtering, hash table, stream index
PDF Full Text Request
Related items