Font Size: a A A

The Design And Implementation Of Distributed Video Vertical Search Engine Based On Elasticsearch

Posted on:2015-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:G W ZhangFull Text:PDF
GTID:2308330479489901Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, Distributed search is an important research direction. With the rapid development of internet, data on the Web is increasing very fast. However, many of them are useless and need no concern. Therefore, how to retrieve useful information from the big data becomes one of major research topics about search engine. In the past years, the study on search engine focus on large-scale cluster centralized system based on replication. The entire system needs to deploy on a high performance server and has a poor scalability. In recent years, the centralized system becomes more and more inefficient, which brings the rapid development of distributed technologies.General search engine also has its drawbacks on the massive web data, such as incomplete query results and coarse search results, etc. While, the vertical search engine can solve the questions above and provides valuable services for a specific area. Compared to the general engines, vertical search is more effective and more in-depth.Based on thorough study of distributed technology and vertical search and combining the advantages of both, This dissertation implemented a small distributed video vertical search engine. On intensive study of the system requirements, we use two different technologies to divide the engine into non-real time offline process and the real-time online process. The offline process utilizes advantage of the batch capabilities of Hadoop to collect and storage massive video data. In view of the de ficiency of Hadoop on real-time processing, we take the Elastic Search whose real-time performance can match the system requirements well as the implement technology for online part. Under this framework, this dissertation completes the improvements on two directions according the user requirements for search engine. This dissertation designs a secondary and tertiary storage mechanism on caching strategies. Experimental results show that the caching system can greatly improve user experience. In addition, Secondary sorting scheme is proposed based on user interest model after studying the trends on existing search engine. Experiments show that this method can achieve very good results.
Keywords/Search Tags:distributed retrieval, vertical search, hadoop, elasticsearch, personalized sort
PDF Full Text Request
Related items