Font Size: a A A

The Research And Design Of Distributed Vertical Search Engine

Posted on:2017-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:C CuiFull Text:PDF
GTID:2348330485487905Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Faced with massive data on the Internet, how to obtain the information we want become a serious problem. Some famous search engines like Baidu, Google and others use the entire network crawlers to get the information, and then feedback to the user based on user needs to meet the needs of most users' queries. But the demand for some vertical depth inquiry Web site, you need to learn the specific structure of the site in order to get their data. In this aspect, the whole network crawlers often "do not work well". Therefore, it's necessary to obtain the information of vertical web sites efficiently according to users' demands.To solve these problems, the thesis design and implement a distributed vertical search engine to meet users' demands for vertical depth information. In addition, the thesis has a detailed test to the performance of indexing and searching. Finally, the thesis studys Chinese word segmentation algorithm and improve its performance, which is very important to the performance of searching.Function tests show that the system can be triggered at certain time based on Hadoop Distributed Multi-threaded crawler to collect data, and process url data through redis, which is a database based on memory, then store detailed product data to the database Hbase. After the data is stored to Hbase, SolrCloud indexes the specified properties of data in Hbase. When the user requests a query, SolrCloud returns results after the distributed searching. In addition, functions of different goods comparing and goods source tracking are provided. Crawler processes on all nodes use zookeeper to monitor their states in order to improve the safety of the system.performance tests show the influence of different fragments set in SolrCloud. And determine the fragments in the project based on the conclusion. In addition, Chinese word segmentation algorithm performance tests show that the improved algorithm for Chinese word has a certain increase in the precision and recall.
Keywords/Search Tags:mass data, crawler, indexes, Hadoop, Nosql, Hbase
PDF Full Text Request
Related items