Font Size: a A A

Research On Metadata Management For BESⅢ Distributed Computing

Posted on:2014-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:L LinFull Text:PDF
GTID:2248330398462918Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The high energy physics experiment of Beijing Electron Spectrum Ⅲ (BESⅢ)produces experimental data of the magnitude of PB. The processing and analysis of thosemassive data are becoming an immense challenge for the existing computing and storageresources. In2011, in order to solve this bottleneck problem, the Institute of High EnergyPhysics, Chinese Academy of Sciences, began to build BESⅢ distributed computingenvironment to integrate heterogeneous computing resources. BESⅢ distributedcomputing uses metadata to retrieve the tens of millions of files. To effectively organizemetadata and locate the address of the required files, the metadata management has beenstudied in this paper. The main contents are as follows:(1) Considering the need of the BESⅢ experiment and users, the overall structure andperformance of distributed computing system, and other factors, we got the basicfunctional requirements of the system. Then, based on these functional requirements, themetadata model and the system’s architecture have been designed. Finally, the metadatamanagement system has been implemented using the catalog service of the DIRACmiddleware and adopting new techniques, such as tree-like directory structure, dynamicconstruction of file names and virtual datasets etc.. The system has been applied to the dataprocessing and analysis. The pressure measurement results show that its performancemeets the requirements of the BESⅢ experiment.(2) Because the accessing of files has the characteristic of regional concentration, aproposal of optimizing the performance of metadata query has been designed based on thememory storage engine of MySQL. This proposal stores hot metadata redundantly in thememory tables. Then, the query of hot metadata is conducted in the memory tablesaccording to the query condition and the validity of the data stored in the memory tables, which can reduce the disk I/O time. The experimental results show that this proposal canimprove the performance of metadata query.(3) Proposed a high availability model with involving two meta-databases toovercome the single point of failure. This design uses the replication capability of MySQLto build a master-master replication structure. At the same time, the management tool ofthe master-master replication is employed to monitor the status of meta-databases andaddress the failover. To keep the load balance between meta-databases, the virtual IPtechnology and least connections policy are applied to make the read and write requests ofusers access different meta-databases. Functional test results show that this design can beapplied to practice.
Keywords/Search Tags:distributed computing, metadata management, high energy physics, queryperformance optimization, high availability
PDF Full Text Request
Related items