Font Size: a A A

Enterprise Search Engine Based On Clucene And Larbin Research And Realization

Posted on:2011-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:H LuoFull Text:PDF
GTID:2208360308966163Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of enterprise and the improvement of enterprise information construction, E-business and portal site are widely used, which leads a result that enterprise inner information grow rapidly. Facing with such large amounts of information, traditional information retrieval methods can not satisfy people's demand for access to information from the huge enterprise inner information quickly and accurately. Currently, people search for information on the Internet primarily through general search engines. The function of these search engines has been strong and they can meet most users'needs. However, as for enterprise themes, such search engines will be insufficient.Enterprise search engine faces the following technical difficulties: multi-source heterogeneous data types, comprehensiveness of the searching content, accuracy of searching and personalized search and so on. The emergence of enterprise search engine is specifically for solving this problem.We design and implement a prototype system of enterprise search engine,actually our system can provide some help for researching enterprise search engine. The main tasks included:1. Discussing the significance, architecture of enterprise search engine.2. Introducing the basic concepts and principles of search engine, researching the enterprise search engine's core technology including Chinese word segmentation, web crawling algorithm and so on.3. Developing the spider of enterprise search engine based on the open source spider Larbin. And the main task is to transcode, login to web, filter url and denoise page.4. In-depth analysis CLucene, developing indexer and searcher based on CLucene.5. Designing and implementing more effective Chinese word segmentation. Specialized field has special requirement for both speed and veracity of segmentation. Design a Chinese word segmentation algorithm based on priority special name: using dictionary mechanism which combine synonyms dictionary, general dictionary and special dictionary, cut the sentences by special name firstly, and get the segmentation result of disambiguating with Trigram mode lastly.
Keywords/Search Tags:Enterprise Search Engine, Larbin, CLucene, Chinese Word Segmentation
PDF Full Text Request
Related items