Font Size: a A A

Design And Implementation Of Heterogeneous Retrieval System Based On Digital Resources

Posted on:2015-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:Q ChenFull Text:PDF
GTID:2298330467963966Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of the Internet, the information age has already arrived. The number of video, audio, text and other forms of digital resources continues to increase. How accurately and quickly to find out the needs of the user in a variety of media resources and sort the results according to the relevance between keyword and multimedia resources should be solved.The subject comes from the subproject of "Digital Home Services key support technology development and demonstration" supported by university laboratories and the National Library. The goal is to establish the service platform for the different types of digital resources of the public libraries using the Internet channels and provide unified retrieval services facing a variety of digital resources for users. Heterogeneous retrieval system on digital resources indexes the resource attributes and stores the indexes in the index database through the parse for the specific format digital resources. The retrieval system consists of four modules, named the index module, Chinese word segmentation module, retrieval module and sorting module. The index module is a prerequisite for the efficient operation of the retrieval system. After indexing the resources, the system could thus call Chinese word segmentation module and achieve retrieval function. Chinese word segmentation module is the core part of the retrieval system, and it plays an important role. The module shields the specific Chinese word segmentation process, and it provides external application calling interfaces. Developers only need place the files generated by Chinese word segmentation module in the project, and then can call the appropriate interface to realize Chinese treatment. Firstly, the paper introduces the research background and significance from two ways in detail. And thus conducts an investigation and analysis for the status of the development of Chinese word segmentation at home and abroad. It sums up the Chinese word problems and the need to address the current problems. Then analyzes the four common Chinese word segmentation algorithms and compare with each other, at the same time, it conducts an investigation and contrast on Chinese word open source project to determine the open source project which project is based on. Based on the above researches and studies, the paper designs and realizes the final system from five aspects named the retrieval system architecture, Chinese word segmentation, indexing, retrieval of digital resources and the sort of results. And it tests the retrieval system from the recall, precision and response time, then realizes the unified search function for the different types of digital resources.In this paper, the Chinese word segmentation module based on dynamic lexicon can avoid duplication of development and learning costs for the developers, and it can be seamlessly coupled with the system, reducing the coupling of the code. It improves the developer’s efficiency to some extent.
Keywords/Search Tags:retrieval, heterogeneous, digital resource, Chinese wordsegmentation
PDF Full Text Request
Related items