Font Size: a A A

A Multiple-Queries Processing Technique On Ziv-Lempel Compressed Texts

Posted on:2011-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:D QuFull Text:PDF
GTID:2248330395958453Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of computer and information technology, the size of data set is increasing exponentially, which gives rise to many new problems. In addition, computer memory requirements have become more sophisticated, so computer memory do not completely load the whole data set into memory in many cases. Therefore, compression and search on compressed data set have become a new research highlight. In addition, queries contain single query and multiple queries. Many people make a significant contribution to single query on compressed data set. Moreover, the research about multiple queries on compressed data set is still blank, but this technology plays an important role in many domains. Such as, spelling, fingerprint recognition, information retrieval, biological computing and so on. Therefore, it is very important to propose a technology about answering multiple queries on Ziv-lempel compressed data set.This paper mainly researches processing technology of multiple queries on Ziv-lempel compressed data set and creates a precedent in this area, which lays a foundation for multiple queries processing technique. According to similarity among the queries in the large-scale system application, this paper proposed a technology about answering multiple similar queries on compressed text. In this paper, after analyzing multiple similar queries, we present a new definition of common substring, which satisfy given length and occurrences. And meanwhile, we propose a method to retrieve common substring and to filter redundancy common substring effectively. Based on common substring, we locate the possible positions of occurrences of multiple similar queries, and construct candidate set. and then we verify these possible positions. As we know, a common substring can cover several queries, so we reduce a lot of cost of processing queries and improve the efficiency of answering queries.This paper evaluated performance of multiple queries processing on two real data sets. The results of our experiments show that our algorithm of retrieving common substring and answering multiple queries is very efficient.
Keywords/Search Tags:compression text, multiple queries, query relativity, common substring, performance
PDF Full Text Request
Related items