Font Size: a A A

Design And Implementation Of A Vertical Search In The Life Service Industry

Posted on:2011-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:P YiFull Text:PDF
GTID:2178360305462509Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the continuously increasing of the Internet web information, they present the following traits:the volume of the web information is huge, and distributed everywhere. The resources are diversity and incomplete. A Vertical Search Engine, which can provide a special vertical industry information search services, is to be one of the hottest topics of Internet technology. The Data's basic unit is the structured information of the Vertical Search Engine. It provides richer, more professional search services for a special vertical industry user. This paper researches the search relative technology about the life services industry. The main researches are on web crawler's design, data processing on large clusters, and the search sort algorithm is studied.First of all, this paper introduces the search engine's fundamental principal, which contains three main parts:Web crawler, web information extraction, query services. The fundamental framework of the Lucene is also introduced. Secondly, a novel system of life-theme web crawler based on HTMLParser information extraction is thoroughly studied and complemented. And 6 million records are crawled successfully with this system. Empirical studies show that the Precision=93.552% and the Recall=96.720%. Thirdly, this article elaborate on how to extract the information based on the HTML format. And also describing how to build the index file. Fourthly, toward to resolve the storage of huge data and the parallel search request. A parallel distributed computing system is used based on the MapReduce programming model. Empirical studies show that the resolution improve the search efficiency by 66.7%.Finally,The search results'sort algorithm is researched, And designed a recommendation sort algorithm based on the user's interesting.
Keywords/Search Tags:Vertical Search, Web Crawler, Information Extraction, MapReduce, Distributed System, Sort Algorithm
PDF Full Text Request
Related items