| Along with the rapid growth of information on Web Site, it is more difficult for people to retrieve useful information among the gigantic amount of web information. Much information can not be found in the web site, becoming the "islands of information". Vertical search engine is a kind of search engine. It collects various Content of a web site through robots called Crawler, and stores the information into databases after the original web page being analyzed. When the web surfer inputs keywords he wants to know, the search engine Searcher indexes in its database and fetches relative web pages for the user.On the basis of research this paper implements four parts of search engine:The paper researches the central controller and core components of Heritrix which is one of web crawler projects, analyzes our Site framework and the specific layout of pages, completes Expansion and customization of Heritrix, and successful downloads the pages we want. Then deal them with Html Parser which is one of Web analytic technology. After Specific analysis of each page, the paper eliminates the useless information on the website, finds the Key elements which the website wants people know. Use Lucene open-source package for the establishment of various types of content indexing. During The part of Chinese Word Segmentation, paper designs and implements our lexicon based JE Segmentation. The last complete the WEB part of search engine user interface based on SPRING and DWR. |