Font Size: a A A

Based On Data Compression, Information Retrieval Technology

Posted on:2003-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhaoFull Text:PDF
GTID:2208360065963951Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information Retrieval technique based on data compression is a new research field, and the research on it is at the beginning.This paper presents a compression method for the pure-english text database and its corresponding decompression method as well as some search algorithms on the compressed text database. These algorithms draw lessons from the knowledge about semi-static model technique and word-based byte-oriented encoding method.The compression method can compress general English text database to about 35% of the original database size, which precedes to many popular compress softwares such as Winzip. Furthermore, retrieval which can be directly executed on the compressed text database increases efficiently the search speed.In addition, a new index structure which can support efficiently search on large full-text database is proposed in this paper, and some search algorithms based on the index structure are designed. Experimental results show these algorithms are very efficient.
Keywords/Search Tags:Information Retrieval, Text database, Text compression, Inverted index
PDF Full Text Request
Related items