Font Size: a A A

Designandimplementationoffull-Textretrieval Systembasedonxapian

Posted on:2014-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2248330398471971Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The advent of the information age has resulted in a tremendous increase in the amount of all kinds of multimedia data resources. It is important for the digital libraries to provide search service for readers in order to help them get the information they want efficiently. The project is a sub item of "interactive media services system of the digital home", which is cooperative work with the National library. The research target of the subject is to design and implement a full-text retrieval system which can provide unified search service for all kinds of multimedia digital resources.The full-text retrieval system builds the index database from extracting content and feature of the multimedia data resources from the metadata files. The system consists of five sub-modules, namely, digital resource analysis module, indexing module, search module, Chinese word segmentation module, and access control module. Chinese word segmentation module is the core part of the system, which provide segmentation function for the indexing module and search module. Chinese word segmentation module has three advantages. First, the module is designed as the independent python third-party software toolkit which provides common Chinese word segmentation for the developers. Second, the toolkit provides uniform, standard, high-level interfaces to the upper layer application developers and integrators. Besides, the toolkit also hides the heterogeneity of the underlying various Chinese word segmentation systems. Third, the toolkit provides comparison functions for the different Chinese word segmentation systems.Firstly, the paper researches the storage format and metadata representation of the multimedia data resources. In addition, the paper studies the main Chinese word segmentation algorithms, including the mechanical segmentation method, understanding based segmentation method, statistics-based method. The paper researches Chinese word segmentation systems in many respects, such as programming language, algorithm, and dictionary, function and interface. Then, the paper analyzes the organization structure, implementation principle, most popular interfaces of Xapian. Based on the above research, the paper finishes the design and implementation of full-text retrieval system through following four aspects including user roles, the system technical architecture, function modules and database design. Lastly, the system was tested in order to know the performance. The test results showed that the system has high recall rate, high precision and satisfied speed of retrieval. Each performance index of the full-text retrieval system can meet the demand. The system can provide unified search service from a large amount of digital multimedia data resources for users.
Keywords/Search Tags:full-text retrieval, Xapian, digital Content, Chinese wordsegmentation
PDF Full Text Request
Related items