Font Size: a A A

A Cloud-platform Academic Search Engine Based On Lucene

Posted on:2016-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:X C ZhangFull Text:PDF
GTID:2308330503450641Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the global internet, the amount of information is increasing swiftly. General search engines represented by Google and Baidu have become important entries for people to access information on the web. This kind of large-scale search engines, however, cannot fully meet the needs to retrieve vertical information with profession and depth, which is why vertical search engines come out.This article proposes a design and implementation scheme based on Apache Lucene oriented to the massive scholarly literatures on the Internet. The scheme focuses on information retrieval service for scholarly literatures, with the merged search results from varies sources. Indexes stored by cloud storage platform, the storage expandability is improved, which ensures the retrieval ability facing at largescale data.This article consists of following content.1)Designs and implements the multi-thread focusing web crawler with high expandability and performance for paper data collecting.2)Studies on full-text retrieval theory and operating principle of Apache Lucene, then designs and implements the retrieval system with a secondary development.3)Studies on distributed database cluster and cache cluster, designs and implements the database cluster based on consistent hashing algorithm and LRU cache cluster based on Redis.4)Studies the theory and structure of MooseFS cloud storage platform, builds the storage platform for Lucene indexes with MooseFS.5)Proposes the solution of the cloud platform scholar search engine based on Lucene with the processes above.In the scheme proposed by the article, multi-site merged search service is implemented, which will effectively decrease the time cost on the retrieval of the massive information on the Internet.
Keywords/Search Tags:Vertical Search, Information Retrieval, Web Crawler, Cloud Platform
PDF Full Text Request
Related items