Font Size: a A A

Research And Implementation Of Keyword Queries Over XML Streams

Posted on:2013-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:J FengFull Text:PDF
GTID:2248330371472082Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
At present, XML data in the form of flow is more and more,which is used in the fields of the stock trading, E-mail monitoring, network information and released, so the keyword queries over it is one hot research topic of queries over XML.The keyword queries over XML streams is different from the structured language queries, such as XPath and XQuery, and users don’t need to master the complex structured query language or the structure information of XML and can obtain the interested information who just need to submit the query keywords. But the characteristics of XML streams, such as high-capacity, the disorders which reach and not controlled, and visited only once, have brought new challenges to the keyword queries. Based on these, the meticulous research on the keyword queries over XML streams was done in this paper.The related technology of the keyword queries over XML streams reviewed and analyzed, this paper specially analyzed the technology of the keyword queries over XML streams. First, in view of the advantages and disadvantages of the existing algorithms, the definitions of the completeness result set—ASLCA (All Smallest Lowest Common Ancestor), and MCS (Max Contain Sequence), and the algorithm of the keyword queries over XML streams—XAMM(XML All-slca Max-contain-sequence Minus)was put forward. This algorithm solved the memory space of the Dewey code wasting, overcome the defects of the XPath&XQuery structured query operation such as not friendly.Then the prototype system of the keyword queries over XML streams was designed, on the basis of in the comprehensive consideration of the user’s intention and the accuracy of query data set, and the design was finished from the user operation, keyword classification, keyword semantic extension, and rough filtration of data set and query execution these five modules. Before the queries, in order to capture users’ queries intention accurately, the expressions of keywords users submit were prescribed in grammar, at the same time, the keywords were divided into the condition keywords used to query and the result keywords used to show the result, according to the roles of the keywords in the query, and the keywords were semanticly extended by the semantic dictionary WordNet. The digital signature over XML document set by Bloom Filter, matching the keyword semantic extended results, was to filter out the unrelated document set. Through rough filteration, the data set can be filtered before queries and prepared for the accurate queries. When the execution, the XAMM algorithm was executed, and to meet the user’s query results returned to the user, in the analysis of existing classic semantic similarity calculation methods based on WordNet all ignore the hierarchical relationships between noitons, NASSC method (based on Notion Asymmetric Semantic Similarity Calculation) was put forward. NASSC method was used to calculate the semantic similarity between candidate result and synonyms word set of keyword, and high similarity of the results would be returned to the user, the query would be completed.Finally, through the contrast test, that XAMM algorithmis was superior to the existing algorithm was proved and XAMM results were completer than SRCT and SLCA result sets, and had good query performance. Through the comparative experiments of NASSC method and classic semantic similarity calculation methods proved that the NASSC method to calculate the similarity has a higher accuracy. Through the demo of the system function, proved that the prototype system can finish the keyword queries over XML streams and was user-friendly.
Keywords/Search Tags:XML Streams, Keyword Queries, the Completeness Result Set, Semantic Similarity
PDF Full Text Request
Related items