Font Size: a A A

Design And Implementation Of A Vertical Search Engine Based On Nutch's Energy Saving And Emission Reduction

Posted on:2016-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:F ZhangFull Text:PDF
GTID:2358330488998084Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In order to promote the sustainable development of our economy and decrease the pressure of environmental protection, our country has begun to promote energy conservation and emissions reduction work. It hopes to solve the resources and environment problems that the contemporary China's development has faced by using this way and hopes finally to take an environmentally friendly way to improve the economic construction. With the development of the Internet, the internet information about energy-saving and emission-reduction has increased dramatically and the related users also want to be able to quickly and efficiently find information related to it. But using general search engines to retrieve information usually appear a lot of useless information. It leads users to get valuable information more difficult. To solve this problem, this paper designs and implements the vertical search engine for energy-saving and emission-reduction based on Nutch that is a open source framework. The main contents are as follows:(1)The implementation of a vertical search engine in the field of energy-saving and emission-reduction. In order to access to information more convenient and effective in the field of energy-saving and emission-reduction for user,the vertical search engine uses plug-in mechanism for its secondary development based on Nutch, combined with technologies of the information extraction and classification, topic relativity determination. In order to remove pages that is not comply with the rules in the template and extract the web part information, ultimately,achieve the goal of improve the accuracy of information extraction, this paper adopted the page template technology for web filtering. In order to get the topic page, this paper use the vector space model to determine subject areas for energy-saving and emission-reduction. This paper also adopte a way combined with the keywords judgment and naive Bayes classify method to divide web pages into three classes of policy-related information, standard's specification, technical literature. In this paper,the web pages is classified according to the different industries using the same way.(2)The design and implementation of vertical search engine's systems management platform. In order to extend and manage the search engine for the person that are not familiar with it more conveniently, this paper design and implement the system management platform based on B/S mode. The platform mainly provide the function about managing users search keywords, indexing about standard's specification, technical literature in the local library, managing web crawler's initial seeds, managing web page classification and configuring and using web page templates,etc.
Keywords/Search Tags:energy-saving and emission-reduction, Nutch, vertical search engine, page template, vector space model, naive Bayes classify method
PDF Full Text Request
Related items