Font Size: a A A

Research And Implementation Of A Distributed Vertical Search Engine In The Financial Sector

Posted on:2015-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:H H SongFull Text:PDF
GTID:2268330428464004Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years with the development of e-commerce, social networking, mobileInternet and smart technology, the information on the Internet is showing explosivegrowth and the searching results of general search engine become more complex andchaotic. And users require search engine not only to provide to provide relevant webpages but also to find the deep knowledge of a given field. However, the world’smajor general search engine giant cannot cover so many areas. Thus, thedomain-oriented, regional and specialized vertical search services provided by smalland medium sized agencies will have great value in the future.Limited by economic capacity and technical strength of the search terms, theInformation retrieval services provided by small and medium sized financial agenciesstill remain in the behind stage of providing structured information stored in thedatabase. Therefore, how to improve the ability of providing high quality verticalsearch services for small and medium sized agencies using the existing technologyframework is a serious problem.In this paper, the technical solution of building the vertical search engine ofsmall and medium agencies by using Hadoop open source distributed memorycomputing platforms and Nutch plug-in mechanism is proposed. We introduce theprinciples and advantages of Hadoop technology platform and the Nutch plug-inmechanism are emphatically studied. We also analyze the current thematic areasfocusing algorithm and characteristics of common Chinese word components andintroduce the web page feature word extraction algorithms. The financial sectorcrawler which is based on Nutch plug-in mechanism and the offline extraction moduleof words which are related words to keywords are implemented. We build a minicomplete search engine which can provide financial sector information retrievalservices using three single PC. Experimental results show that the solution has thefeasibility and some practical value.
Keywords/Search Tags:Financial Sector Vertical Search Engine, Hadoop Platform, Nutch Framework, Focus Plug-in, Feature Word Extraction
PDF Full Text Request
Related items