Font Size: a A A

Research On The Implementation Of Enterprise Intelligent Search Engine Based On Lucene

Posted on:2016-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:X F WangFull Text:PDF
GTID:2308330467482152Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The traditional search engines on Internet can search a lot of information for users. But forenterprise users, the results’ coverage returned by them is too wide and it can not search themassive reports and databases in their internal system effectively. Combined with the prosperityof modern enterprises informatization, search engines for special purpose become the urgentneed to improve their work efficiency. Hence enterprise search engine has become one of thefocuses in current science and technology.Therefore,"Intelligent Search Engine in Enterprise Based on Lucene" was selected for thesubject of this thesis. It put forward architecture for the intelligent search engine particularlyoriented to enterprises, and introduced the project of intelligent search engine system inenterprises for Zhejiang tobacco industry according to the specific needs of it. The key points ofthis research are as follows:(1) Summary of two kinds of demand for modern enterprises’ search engine according tothe goal of Zhejiang tobacco industry intelligent search engine system: one is to provide userswith a specific industry related information through the topic search, such as industrydevelopment; an other is to find the related reports or the content of database in the internalsystem based on the common query keyword the users input.(2) Design of an incremental web crawler focused on specific topic: establishment ofcrawler’s architecture with tobacco as the theme based on traditional crawler structure;improvement of crawling strategy to raise the crawling rate and accurate rate; development of anincremental model according to the tobacco industry websites’ characteristics to improve thetimeliness of the crawler; and cloud storage for the crawled results, with processing such asdenoising and duplicate removal etc.(3) Establishment of an intelligent retrieval model in tobacco industry based on Lucene. Itcontains: a relevance document ranking algorithm combined with the improved algorithm ofPageRank and the vector space algorithm in Lucene; concept of "dimension key words"according to the design features of the data warehouse in Zhejiang tobacco industry; constructionof tobacco domain ontology; the strategy of expanding keyword semantic based on ontology;and the design of the architecture for querying the relational database in tobacco industry.(4) The introduction of architecture for the enterprise search engine system. Several levelswere included: data extraction, data acquisition with crawler focused on tobacco, data processing,data storage, information retrieval, system management and page display.(5) Design and implementation of search engine system for Zhejiang tobacco enterprise. Realized a web search on tobacco theme with much higher precision than general search engines,as well as the search of internal reports and relational database in Zhejiang tobacco industry bysimple keyword. It met the needs of Zhejiang tobacco industry for the search of enterprise, andalso can be extended to other enterprises.
Keywords/Search Tags:Enterprise, Search Engine, Focused crawler, Lucene, Rank, Keywords, Dimensions, Ontology
PDF Full Text Request
Related items