Font Size: a A A

Research And Implementation Of High Performance Computing Service Search Engine Based On Cloud Platform

Posted on:2023-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y L TanFull Text:PDF
GTID:2568307097485454Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the vigorous development and progress of the science and technology information industry,the rapid development of high-performance computing technology under the support of the state,high-performance computing services in the model,architecture and other fundamental research has made a series of research results,and a large number of high-performance computing services have emerged.Especially in recent years,China’s major supercomputing platforms have deployed a large number of self-developed or open-source high-performance computing services for the majority of users.But in these massive and scattered high-performance computing service resources,users cannot quickly and precisely find the target service resources they need,which is not conducive to the promotion and application of high-performance computing services.Therefore,for the current field of HPC services,it is crucial to have a vertical search engine that can quickly and accurately retrieve information of HPC services in various fields.Therefore,this paper researches and implements a cloud-based search engine for the field of HPC services.In this paper,we study cloud platform,web crawler,distributed data storage,full-text retrieval,knowledge base Q&A and other related technologies,and make full use of the resources provided by the National Supercomputing Center Changsha to build a Kubernetes-based cloud platform for flexible deployment and operation of the system in this paper.The Scrapy-Redis crawler framework is used to design and implement a distributed web crawler for the field of high-performance computing services,which is used to collect data of high-performance computing services in various fields and provide data support for the search engine in this paper;meanwhile,the URL de-duplication in the crawler framework is improved by using Bloom filters,which effectively improves the de-duplication efficiency.HBase,a non-relational database,is used for efficient distributed storage of massive HPC service data.Based on the full-text search engine ES(Elasticsearch),indexes(Index)for efficient full-text search are designed and created according to the characteristics of HPC service data,and on this basis,diverse full-text search services are realized.The test results show that the method effectively improves the search accuracy of HPC service information.Finally,the front-end interface of the search engine is implemented using visual front-end components.The search engine system researched and implemented in this paper will collect,process and retrieve HPC service data from various fields on the Internet,providing a professional search engine for users of HPC services,filling the gap of discovery products in the field of HPC services.Users from different research fields can use this search engine for professional HPC service information retrieval,which provides convenience for users to retrieve HPC service information and also plays a good role in promoting the self-researched HPC services of the supercomputing center,so it is valuable and meaningful.
Keywords/Search Tags:Vertical search engine, High-performance computing services, Cloud platform, Web crawler, Elasticsearch
PDF Full Text Request
Related items