Font Size: a A A

Efficient TOP-K Keyword Search On XML Sreams

Posted on:2011-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:L L LiFull Text:PDF
GTID:2178330338979940Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
XML has become the defacto standard for various kinds of data over the internetbecause of its ?exibility and expandability. In some application, XML data is in streamingform, which is continuous coming XML fragments. We use the term XML Stream torefer to XML fragments in streaming form. The applications of XML stream includeinformation scribe/publish on internet, email detection and so on. Since XML streamsmay come from various and dynamic data sources, this search model is complicated formost users. Thus querying XML streams with keywords without schema informationis in great demand. In this paper we focus on keyword search on XML streams. Thekeyword search on XML streams is that given a set of keywords Q, the set of all the XMLfragments in the XML streams with each of them containing all the keywords is retrieved.In order to process TOP-K keyword queries on XML streams efficiently, we propose threenovel kinds of TOP-K keyword queries on XML streams. Efficient algorithms are alsopresented to process these queries. The contributions of this paper can be summarized asfollowing:(1) A query processing method based on ranking is presented to process TOP-Kkeyword queries on XML streams. In details, we propose a novel ranking strategy RSRto efficiently evaluate the relevance of queries; a stack-based algorithm TKS is presentedto process TOP-K keyword search over XML streams efficiently, both the time complex-ity and the space complexity of TKS are analyzed; a filtering method is presented toimprove the efficiency and save storing space; experimental results show that such queryprocessing method has high efficiency and scalability.(2) A query processing method based on skyline is presented to process TOP-Kkeyword queries on XML streams. In details, Skyline is applied to keyword query onXML streams for the cases that the same keyword query may have various intensions,which is a new aspect for result selection of keyword search results on XML data; looseskyline TOP-K keyword queries(LSK for short), a novel kind of queries is presented inthis paper for effective TOP-K keyword search on XML stream; an efficient algorithm ispresented for the processing of loose skyline TOP-K keyword queries on XML streams;multiple loose skyline TOP-K keyword queries(MLSK for short), a novel kind of multi- queries based on LSK queries is presented in this paper; an efficient algorithm, MULSK,is presented for the processing of MLSK queries on XML streams; experimental resultsshow that such query processing method has high efficiency and scalability.(3) A query processing is presented to process TOP-K keyword queries on XMLstreams in distributed environment. In details, we propose a novel filtering method oneach router to efficiently reduce the throughput of overlay network; we prove that theproblem of minimize the throughput of overlay network by constructing the topologicalstructure is NPC problem; an approximate algorithm of constructing topological structureof overlay network is presented; experimental results show that such query processingmethod has high efficiency and scalability and verify the effectiveness of this algorithm.
Keywords/Search Tags:XML streams, keyword search, TOP-K query, skyline query
PDF Full Text Request
Related items