Font Size: a A A

Research On Conical Retrieval Based On Distributed Massive Sky Survey Catalog

Posted on:2022-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:C YangFull Text:PDF
GTID:2510306524451744Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the progress of numerous sky survey observation programs at home and abroad,my country's manned space engineering major scientific project-the optical capsule research and development work has also entered an important finishing stage.It is expected to collect up to PB level of massive observation data and undergo scientific processing methods.Generated a typical astronomical data product astronomical star catalog(referred to as "starcatalog").Astronomers and scientists usually conduct special research work in a specific area.For Astronomers and scientists,.how to retrieve the astronomical catalog efficiently.and accurately is an important basis for.the following research work of astronomers.Traditional star catalog retrieval relies too much on the index technology of traditional databases,but as the data volume of star catalog data sets continues to increase,traditional databases gradually reach storage bottlenecks,and indexes continue to increase,resulting in more data redundancy Secondly,the computing resources of its stand-alone computer cannot meet the processing of such a large amount of data,resulting in low efficiency in the retrieval of large star catalog data.Therefore,the conical retrieval of star catalog data generated from massive astronomical observation data not only needs to improve its retrieval efficiency,but also requires a more effective retrieval method.This is the problem that we urgently need to solve.Our work resolves the difficulties in the cone search of massive astronomical catalogue data,and solves the challenges and difficulties that its traditional relational database,it can not meet the high-efficiency retrieval.The main work of this thesis is as follows:Firstly,from the perspective of astronomers,analyzing the characteristics and retrieval requirements of astronomical catalogs,it is found that as the amount of data in the catalogs increases,the traditional database itself is not efficient in establishing indexes and materialized views of the catalogs.It takes a lot of resources and time.At the same time,.a single machine is difficult.to perform to retrieve star.catalog data in excess of a billion.This article uses the new DIF plug-in for integration with the My SQL database to map the original two-dimensional index(RA,DEC)to one-dimensional space.This saves resources and significantly reduces the time that it be used to index.On this basis,in order to alleviate the insufficient performance of the stand-alone environment and the storage pressure,the use of database middleware technology to sub-database and sub-table of the big star table is proposed,and the advantages of fully integrating relational database and distributed technology are proposed.Secondly,when using database middleware technology to horizontally expand relational databases,it is found that the row storage architecture of relational databases is not conducive to cross-database joint operations,and as the number of nodes increases,the cost of data communication between nodes increases.Big problem.This thesis compares the difference between the architectures of columnar storage and row storage,analyzes the principle of parallel optimization at the hardware and software levels,and theoretically proves that columnar storage is more efficient than row storage in astronomical catalog retrieval,and saves more calculations and calculations.Storage resources are more scalable.Thirdly,in order to verify the superiority of columnar storage data compared with the traditional database solution for star search,this thesis builds a distributed cluster of database middleware My Cat and My SQL database,and builds a cluster of Click House+Zoo Keeper,and compares and tests The performance of the two schemes for conical retrieval of star catalog data,as well as the scalability of the two,verify that the columnar storage database has obvious advantages not only in data storage,but also in retrieval efficiency.This advantage also follows The expansion of the cluster continues to increase.In summary,the thesis proposes a distributed row storage database solution for traditional star catalog data retrieval,and optimizes the pseudo-spherical index through the DIF plug-in,which solves the problem of conical retrieval under the traditional model to a certain extent.Not high,the problem of data redundancy.On this basis,a distributed solution of column storage is proposed.Compared with the distributed row storage database solution,the cone retrieval efficiency is improved and it has better scalability.In short,the two cone search optimization schemes proposed in this article can meet the current needs of the cone search of billions of star catalogs,and can achieve near real-time search.At the same time,it is found that the columnar storage architecture has greater potential for future development.,Which can provide reference and reference for the cone search of other astronomical catalogs.
Keywords/Search Tags:Astronomical catalog, cone search, distributed, column-based storage, row-based storage
PDF Full Text Request
Related items