Font Size: a A A

Research On Parallel Indexing And Cache Of Searching With Massive Data Based On Solr

Posted on:2017-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:J Z MeiFull Text:PDF
GTID:2308330488985691Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet information technology,enterprises informatization level is being enhanced,and the form of the information is becoming more and more diverse.It also makes how to enable users to quickly and accurately find out the key information in the massive data of enterprise become more and more important, and the key to solve this problem is search engine technology.With the emergence of open source enterprise search application server Solr, more and more enterprises begin to use it to build their search service system for massive data search.Traditional way of search is the use of text database, although both the text database and search engine technology are based on the full text retrieval technology, but the text database ability in dealing with a massive data can’t compare with search engine.The search engine uses the strategy of server cluster and distributed computing, which can handle large amount of data.In the context of massive data, how to make better use of open source search engine server Solr to build fast and efficient search engine also become an important research subject of today.Aiming at this kind of demand, this paper carries on the research of massive data search service based on Solr search engine server.This paper analyses the characteristics of search engine server cluster and the availability of the cluster system using the queuing model in the stochastic process.And then,this paper studies deeply in two key processes of the search engine,indexing process and searching index.The process of indexing is to translate the text into an indexed file that can be searched by the search engine.In indexing process, the part of the operation is found to operate in a serial manner with poor efficiency.On the basis of the analysis, the paper puts forward the way of parallel index,making the indexing process operation parallelism, accelerating index construction and improving resource utilization of each node and to cope with massive data to construct an index of the scene within the performance of load balancing strategies.For the search process, this paper first analyzes the Solr search process, and puts forward the hierarchical cache model.The model stores hot data which was searched frequently in the cache through a hierarchical process.And then,the data which is similar to the hot data through similarity computation will be loaded into the cache model at the same time.In addition, this paper puts forward the maintenance strategy for the hierarchical cache model, making full use of the cache space.Finally,this paper improves the traditional cache preheating algorithm in order to improve the cache hit rate of the system.The indexing process and searching index of massive data are studied on the search engine server cluster. The experimental results show that the proposed parallel index and hierarchical caching model can effectively improve the search engine’s index construction speed and accelerating request response.
Keywords/Search Tags:Search engine, Solr, Parallel index, Hierarchical caching, Word similarity
PDF Full Text Request
Related items