Font Size: a A A

Design And Implementation Of Web-based Resource Digitized Full Text Retrieval System

Posted on:2011-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:J MaFull Text:PDF
GTID:2198330332988221Subject:Educational technology
Abstract/Summary:PDF Full Text Request
The rapid development of information technologies and the advent of the Wore-Wide Web have resulted in a tremendous increase in the amount of available information, especially the data in the format of doc,pdf and txt are also expansion rapidly. Facing the more and more popular trend of the network electronic information, how to reasonable arrangement the format and content of the information and how to let them to bring the biggest convenience to the internet users, this is the research goal of this thesis. The research aim of this subject is to build a chorography Full-Text retrieval system based on B/S which accessed via web. The research target of this subject is to design and develop a Chinese full-text retrieval system, thus realizing the full-text search functionality of local gazetteers. An improved inverted index algorithm named Cluster index realign algorithm is designed and used to store the compressed mass index file, for reducing the index storage space and improve retrieval efficiency results.The subject studied the principal of full-text retrieval in details, including the principle of full-text retrieval index, index content, indexing and search. This paper analyses the structure, index process, search process, the data flow, organization structure and their respective bag and retrieval functions of Lucene.The index storage structure has been analyzed further in this paper, designing an algorithm which could effectively improve the compression ratio of index----cluster index realign algorithm. This algorithm array the high similar document together in order to decrease the small d-gap frequency through clustering, which would make the index compression more effective.Accroding test, cluster index realign algorithm could decrease the storage space of index,and at last, accieve index compression effect.
Keywords/Search Tags:Full-Text Retrieval, Lucene, Cluster Index Realign Algorithm
PDF Full Text Request
Related items