Font Size: a A A

Design And Implementation Of Petroleum Enterprise Massive Webpage Retrieval System

Posted on:2014-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z W ZhuFull Text:PDF
GTID:2268330401967085Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With of oilfield enterprises exploration integration continues to grow and develop,for the production and operation of scientific data analysis as well as a unifiedorganization and management, paperless office, electronic documents and web pagesdemand increases and appear, making the number of documents each yearexponentiallyincreases, save the document amount becomes very large and large.Electronic documents in existing search engines can not provide specific informationindex. Custom enterprise-class document retrieval, and to achieve documentinformation quickly find, call, to solve the inefficiency of conventional informationretrieval Find the problem of inaccurate. According to the latest Internet survey, untilnow, the Internet has a total of more than hundreds of millions of website informationcontent. Google, the world’s largest search engine, a collection of over8billion pages,search engines, web extraction system (also known as reptiles), is one of the searchengine’s main application module, while the speed of the reptiles crawl the web qualityand lay the engine the search efficiency standards. Reptiles meet the needs of theenterprise data collection; reduce unnecessary duplication of information gathering anddata duplication.In this paper, the current enterprise massive web crawlers’ defects, according to thespecific business needs of the oilfield enterprises to put forward a new informationretrieval method, in the institutional structure of the retrieval system, the introduction ofmulti-Field thinking. In addition, the for enterprise LAN hardware conditions, thesystem uses lexical analysis based on Lucene, web page data analysis, efficientextraction of pure text content of the page. Finally, system integrity verification andperformance analysis has carrid out.Finally, the system is tested, the test results show that the system meets the needsof enterprises massive web crawling has certain advantages in reliability, availability,stability, speed and security...
Keywords/Search Tags:Internet search engine, reptiles, Lucene, information extraction
PDF Full Text Request
Related items