The Research On Key Technologies Of Search Engine Under Cloud Environment

Posted on:2017-03-21

Degree:Master

Type:Thesis

Country:China

Candidate:J K Yu

Full Text:PDF

GTID:2348330488997068

Subject:Information networks

Abstract/Summary:

PDF Full Text Request

With the explosive growth of the Internet, search engine has become the main entry point when people surfing on the Internet. But faced to the massive data on the Internet, the traditional search engine can not adapt to current Internet environment in the crawling speed and data storage aspect.This paper focuses on the three key technologies of search engine system under the cloud environment. The research works are done as follows:Firstly, in order to reduce the repeated pages fetched from the Internet, this paper introduces a location related text duplicate detection algorithm called SWLR(Shingling with Location Relation).SWLR is developed from the Shingling algorithm and LCS algorithm. The experimental result shows that SWLR is better than Shingling in recall and precision rate. In order to speed up the detection speed, this paper introduces a bit based filter method to filter out the not like texts. The final experimental result shows that the fast SWLR performs better than Shingling in speed and has the near same performance in recall and precision rate with SWLR.Secondly, to accelerate the query speed when using multiple key words in the index system,this paper proposed a linked list full text index model. This model is based on the inverted index model. By adding a link point to the neighbor word in the Term node, the linked list full text index can compare the neighbor word in O(1) time. The experimental result shows that the linked list full text index has a good performance in index constructing, search costs and memory consumption.Finally, this paper designs a new crawl model based on the Hadoop environment. By paralleling the analysis system and the fetch system, this crawl model could make full use the I/O resources and CPU resources. The experimental result shows that the crawl model proposed by this paper could easily be extended and load balance.

Keywords/Search Tags:

search engine, web crawler, full text indexing, text duplicate detection

PDF Full Text Request

Related items

1	Research Of Intranet Information Supervision System Based On Net Crawler And Full-text Search Engine
2	Full-Text Search Technology Research And Application In "2008 Olympic Games" Multi-Language System
3	Vertical Search Engine Based Public Opinion Alert And Analysis Platform
4	Full Text Search Engine Realizes Data Information Collection
5	Research And Realization Of Full-Text Search Technology
6	Research And Application Of Intranet Search Engine Technology Based On Lucene
7	The Design And Implementation Of Vertical Search Engine Framework
8	Research And Implementation Of Vertical Search Engine
9	Research Of English PDF Full-text Search Engine Based On Lucene And Web
10	The Design And Realization Of The Full-text Search Engine Used In The E-mail