The Internet Public Document Search System Based On Vertical Search Technology

Posted on:2017-11-06

Degree:Master

Type:Thesis

Country:China

Candidate:J Dong

Full Text:PDF

GTID:2428330542976838

Subject:Computer technology

Abstract/Summary:

Nowadays,with the development of cloud storage,virtual office and other Internet technology,ways for people to obtain information have changed from the traditional era of social media to the Information Age,in which Google,Baidu and other general search engines are widely used by people to retrieve and obtain information from the Internet.General search engines rely mainly on traditional web crawler technology,which collects information and data on the Internet "generally"but fails to "accurately" meet user's needs in the field of professional retrieval,indexing massive level data,information update speed,personalized services and so on.Therefore,the topic-focused web crawler and vertical search services,which aim at specific users,cover specific areas and meet specific needs,have emerged and are becoming vital part of the field of information search.Focused on the current frequently-used and rapidly growing vertical document retrieval services,the paper primarily studies the oriented search and personalized application of various types of documents on the Internet to achieve public Internet document collection system based on vertical search technology.First,the vertical search technology and information extraction technology are used to collect and store the document data and web information which users are concerned about on designated professional website;Second,combined with acquisition-type meta-search technology,the existing general search engines are used to collect and store various types of public electronic documents existing on the Internet;Third,the incremental indexing technology is used to achieve secondary search of document data collection and results display.Innovations of this paper:first,the author analyzes the web page URL link intelligent recognition processing algorithms and text-based DOM tree density text extraction algorithm to optimize the document data collection program;second,through Lucene full-text search engine,a unique self indexing module is given,combined with the sophisticated Baidu hard disk search technology,to achieve the index for Word,Excel,PDF,PPT and other public documents of the Internet,keywords search and document extraction.

Keywords/Search Tags:

meta-search, vertical search, topic-focused web crawler, information extraction, document collection

Related items

1	Research And Realization On Focused Crawler Key Technologies Of Vertical Search Engine
2	Technology Research, Based On Focused Crawling Of Web Information Collection
3	Research Of Main Technologies Of Vertical Search Engine
4	Research On An Algorithm Of Focused Crawler In Vertical Search Engine
5	The Research On Focused Crawling Algorithm In Vertical Search Engine
6	Customizable Focused Crawler
7	Research On A Method Of Focused Crawler For Vertical Search System
8	Research And Implementation Of Large-Scale Vertical Search Method
9	Research And Application Of Vertical Search Engine Key Technologies Based On The Lucene
10	Research And Implementation On Focused Crawler With New Strategy For The Vertical Search Engine