Font Size: a A A

Query Processing On XML Data With Dirty Tags

Posted on:2012-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:G H JiangFull Text:PDF
GTID:2218330362450417Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rising of data interchangement between organizations, as the de facto standard form of data storing and exchangement on the Web, XML is considered more and more significant. However, dirty data in XML as incorrect data, inconsistent data and imprecise data, bring challenges to effective query processing on XML. There is strong demand from XML's research and spreading to study direct querying on XML with dirty data. Because twig queries are important to academia, with emphasis on dirty tags, this paper studies the processing for twig queries on dirty XML and its optimization.Making use of XML's content and structure, this paper proposes the query processing algorithm on XML with dirty tags. For necessary preprocessing XML to support the efficient query processing, this paper divides the whole processing into offline document processing and online query processing. The algorithm in this paper first utilizes contents, father-son relations and other structure information in documents and queries to obtain every tag's similar spellings, relaxations and synonyms. Then, with the help of these similar tags, this paper defines similar queries and their similar distances, and gives three operators which can support the efficient running of the query processing algorithm. Taking advantages of them, adopting the space-for-time method, this paper proposes the efficient algorithm which can compute all the similar results with user queries, and return them in order of their similarity in XML. Experimental evaluation is given to measure the effectiveness and efficiency of this processing.To conquer the inefficient parts in the above method, this paper gives out its two optimizing points, and their implementation document processing optimization and query processing optimization. Document processing optimization uses document processor's property of traversing all the paths in XML, to generate all the cascade information between tags ahead; then with this information, query processing optimization judges all the cascades in queries to filter unusable tags out as early as possible. This paper makes many experiments to evaluate these two optimizations'efficiency and analyzes their outcomes deeply.
Keywords/Search Tags:dirty data, XML, twig query
PDF Full Text Request
Related items