Font Size: a A A

The Design And Implementation Of LIS Blog Search Engine Based On Nutch

Posted on:2012-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:B K ChenFull Text:PDF
GTID:2218330338458191Subject:Library science
Abstract/Summary:PDF Full Text Request
With the advent of Web2.0 concepts and technologies, the global Internet users enjoy a rich variety of interactive information services. Blog is the typical representative of the interactive information services. Under this background, the Library and Information Science students, researchers, etc. have written blogs to exchange information. However, LIS blogs disperse in the Internet and the quality of the blog contents are uneven, which bring inconvenience to the users in the LIS. Although the Google Blog Search, Baidu Blog Search and other related topical search engines have tried to solve the problems, they still can not meet the LIS users'needs. In order to satisfy the needs of the users, the paper intends to construct the LIS Blog Search Engine. Firstly, the paper analyses the search engine technology and LIS blogs, then introduces the open source search engine-Nutch and establishes the design scheme of LIS Blog Search Engine based on Nutch, then develops the LIS Blog Search Engine according to the scheme and finally assesses the effects of the LIS Blog Search Engine by several experiments. The main contents of the chapters in the paper are as follows:Chapter 1. Introduction. The chapter describes the background, significance, research status, research methods and innovations of the paper.Chapter 2. Search Engine Technology and LIS Blogs Analysis. The chapter firstly analyzes the operation principles of the search engine and the topical search engine, pointing out that the main differences between them rely on the information collection module and analysis module. The topical search engine improves the web crawler module and builds a theme thesaurus for information filtering in the analysis module. Secondly, in order to increase the overall understanding of the LIS blogs, it analyses the blog site structures, blog page contents and link structures among different blog sites.Chapter 3. Introduction to Nutch and the Configuration and Operation of Nutch. This chapter firstly introduces the basic information and structure of Nutch, outlining a preliminary understanding of Nutch. Then it configures the operational environment and elaborates the procedures of operating Nutch, making a further understanding of the operating principles and structures of Nutch.Chapter 4. Design of LIS Blog Search Engine Based on Nutch. By means of software engineering theories, the chapter firstly analyses the goals, the problems to be solved and feasibilities of LIS Blog Search Engine, then summarizes the users' needs by drawing the Use Case Diagram and the Sequence Diagram, finally establishes the overall design and detailed design scheme of LIS Blog Search Engine. Chapter 5. Implementation of the Core modules in LIS Blog Search Engine. According to the detailed design scheme above, the chapter improves three core modules in the LIS Blog Search Engine. Firstly, in the resources discovery module the chapter uses information retrieval theories and practices in LIS field to obtain the topical resources, then in the crawler module the chapter establishes the crawling strategy by using professional software-Concept Draw Web Wave Trial (v.5.5.1), finally in the users' retrieval module it introduces some improvements in terms of the users' needs.Chapter 6. Experimental Assessment and Conclusion. The chapter firstly establishes a list of parameters and carries out six rounds of experiments in terms of the parameters, then analyzes the experimental results. Finally, the author summarizes the features and shortcomings of LIS Blog Search Engine and predicts the future improvements on the search engine.
Keywords/Search Tags:Lucene, Nutch, Search Engine, Topical Search Engine, LIS Blog, Blog
PDF Full Text Request
Related items