Font Size: a A A

Study Of Indexing Techniques For Encrypted Full-Text Retrieval System

Posted on:2010-09-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:W WuFull Text:PDF
GTID:1118360305992245Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the information industry, various kinds of information resources grow rapidly. There are massive literature information and digital documents produced in the process of the informationization advancement. These digital resources need highly effective management and utilization; especially, the demand of locating and searching certain results in the huge amount of data becomes more and more important.The full-text retrieval technique provides a high-efficient way of searching required information. However more and more information security problems occur during retrieving data, especially in some high-grade security environments, such as E-commerces and E-governments. The search and utilization of the archived information resources must obey high security standard. These requirements are as follows.1) Digital documents must be stored in encrypted format; thus, new indexing and searching techniques for the encrypted full-text retrieval system are required.2) The inverted index must be secure enough not leaking information related to the encrypted source documents; that is, the inverted index should be encrypted as well.3) In the full-text retrieval system supporting multiple users, it should be able to grant different users different rights or roles. Even the same key words are issued to search for the required documents, different users may get different results.4) There should be more complicated on index maintaining and updating since the index is encrypted.To mitigate the security problems of the inverted index, this dissertation provides a new index encryption method, which physically splits the index files into many blocks. The block is the smallest encryption unit rather than index term entry. This method obscures the logical structure of the inverted index and makes the statistic-based attack and chosen plaintext attack impossible. This encryption method improves dramatically the secure level of the inverted index. According to the look-up path of the query term, an on-demand decryption strategy is adopted, which greatly reduces the amount of data needing to be decrypted. It also makes encryption and decryption be transparent to the index constructing, updating and searching process. Furthermore, it ensures the security of the index, at the same time without losing any query capability.To implement the access control operation on the users' searching requests, the dissertation provides an access control security model based on roles and document security level. This model has provides the flexible access control functions without affecting the query efficiency. When users change their rights or roles, the searching results can reflect the change in a real-time manner. Furthermore, by introducing the concept of roles, the change of user rights affects little on the index itself, thus, reducing the burden of the index management and maintenance.To improve the efficiency of index maintenance, the dissertation presents a new in-place index update method. By using this new method, the size of disk storage allocated for the inverted index isn't irregular any more, which avoids the disk fragments. The free disk space blocks don't need to be sorted by address offset. It is nearly constant time complexity to manage the free disk spaces. Furthermore, the method avoids random disk access operations and improves index maintaining efficiency by updating posting list in its physical order.Finally, based on above research results, an encrypted full-text retrieval prototype system, named Mimir, is designed and implemented. The core module of Mimir is built on the Apache Lucene program library. Performance benchmark and resource consumption analysis are conducted on the encrypted inverted index built on the enwiki content sources. The experimental results demonstrate the excellent balance between security and efficiency of the information retrieval system.
Keywords/Search Tags:Full-text Retrieval, Inverted Index, Encrypted Index, Secure Index, Access Control, In-place Update
PDF Full Text Request
Related items