Query Processing On XML Data With Dirty Tags

Posted on:2012-10-20

Degree:Master

Type:Thesis

Country:China

Candidate:G H Jiang

Full Text:PDF

GTID:2218330362450417

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rising of data interchangement between organizations, as the de facto standard form of data storing and exchangement on the Web, XML is considered more and more significant. However, dirty data in XML as incorrect data, inconsistent data and imprecise data, bring challenges to effective query processing on XML. There is strong demand from XML's research and spreading to study direct querying on XML with dirty data. Because twig queries are important to academia, with emphasis on dirty tags, this paper studies the processing for twig queries on dirty XML and its optimization.Making use of XML's content and structure, this paper proposes the query processing algorithm on XML with dirty tags. For necessary preprocessing XML to support the efficient query processing, this paper divides the whole processing into offline document processing and online query processing. The algorithm in this paper first utilizes contents, father-son relations and other structure information in documents and queries to obtain every tag's similar spellings, relaxations and synonyms. Then, with the help of these similar tags, this paper defines similar queries and their similar distances, and gives three operators which can support the efficient running of the query processing algorithm. Taking advantages of them, adopting the space-for-time method, this paper proposes the efficient algorithm which can compute all the similar results with user queries, and return them in order of their similarity in XML. Experimental evaluation is given to measure the effectiveness and efficiency of this processing.To conquer the inefficient parts in the above method, this paper gives out its two optimizing points, and their implementation document processing optimization and query processing optimization. Document processing optimization uses document processor's property of traversing all the paths in XML, to generate all the cascade information between tags ahead; then with this information, query processing optimization judges all the cascades in queries to filter unusable tags out as early as possible. This paper makes many experiments to evaluate these two optimizations'efficiency and analyzes their outcomes deeply.

Keywords/Search Tags:

dirty data, XML, twig query

PDF Full Text Request

Related items

1	Research On Query Estimation Techniques On Dirty Database Management System
2	Research On Key Technology For Query Optimization On Dirty Database
3	Research On Query Processing Technology For XML Data Based On HoListic Twig Pattern
4	Research On Twig Pattern Query Based On XML Data
5	THE AERODYNAMIC PROPERTIES OF SPRUCE TWIG ELEMENTS
6	Research On Twig Pattern Query In XML Database
7	Research On XML Twig Query Optimization
8	Research On Query Processing Technology For XML Data Based On Holistic Twig Pattern
9	Research On Labeling And Twig Pattern Query Of XML Data
10	Research On The Method Of Fuzzy XML Complex Twig Query With Predicates