Font Size: a A A

The Research And Application Of A Vertical Search Engine In Campus Network

Posted on:2013-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:M Y JiangFull Text:PDF
GTID:2248330362472172Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, it has become an important issue to searchinformation efficiently from massive data information. Although there have been manyoutstanding general search engines such asGoogle and Baidu, they can not fully andaccurately collect the information on the LAN and guarant the efficiency ofinformation,which make them not Retrieve information based on industry.The construction of Campus Network in Colleges and Universities is more mature thanever. Public information, such as the information of undergraduate and graduate aboutenrollment and publicity Within the campus network grows greatly,. But if users use theuniversal search engine, they can not get effective campus network information.Therefore,inorder to improve the efficiency of information retrieval based on industry,we designed andimplemented a vertical search engine system adjusted to Campus Network in Colleges andUniversities.In this article,a vertical search engine which applies in Xi’an University CampusNetwork was researched and designed. Firstly, working principle and main components of thegeneral search engine were introduced, and realization principle of the vertical search enginewas analysed. The paper designed and completed search engine′s core modules, which areWeb page capture module, preprocessing module, index and query module.In the web pagecapture module, the function of downloading Web page and filtering the visited URL wascompleted. In the preprocessing module, two schemes of Web page cleaning were comparedand the beter scheme was adopted. Meanwhile,Chinese word segmentation was completedand because of this characteristic of the weak effect of Lucene Chinese wordsegmentation,this paper studied Chinese word segmentation technology and Improved thedefects of maximum matching method in order to improve query accuracy;In the index andquery module, an inverted index is built and PageRank algorithm better than the Lucenebuilt-in sorting algorithm is used to conduct a webpage ranking. From the experimental results it can be seen that the system with higher precision thanBaidu search results is able to meet the needs of users who want to understand the campus netinformation better.
Keywords/Search Tags:Vertieal Seareh Engine, Campus Network, Web Crawler, Lucene
PDF Full Text Request
Related items