Font Size: a A A

Optimization Of Parallel Querying Big-data In The Field Of Astronomy

Posted on:2016-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:S B ZengFull Text:PDF
GTID:2308330503451113Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Big data is becoming one of the hottest topics as the amount of data people produced is increasing. It is another disruptive technological change after cloud computing and Internet of things. The application of big data will have a huge impact on the government, enterprise, and individuals. Dealing with massive amount of data is the core part in big data area.In the scientific area, the modern cutting edge science and technology are heavily relied on dealing with massive amounts of data, for example, the Large Hadron Collider, the Sloan Digital Sky Survey, and the Large Synoptic Survey Telescope(LSST). When the LSST comes online in 2016 it will generate over 20 TB data per night. In its lifetime, about 60 PB data will be collected and stored in the database. It would be a big challenge to current database management system.Oracle Real Application Clusters(Oracle RAC) is a clustered version of Orac le Database. It provides a new way to handle huge amounts of data. This project aims to migrate the data produced by the LSST project to Oracle RAC and test the performance of the clusters. It is also a technical challenge to migrate these data and establish efficient index and appropriate data partitioning in order to execute query in parallel.The project is affiliated with LIMOS laboratory Petasky project of the France national academy of sciences. The project aims to use a relational database model to analyse massive astronomical data efficiently. In this project, I used Oracle RAC to deal with nearly 100 GB of simulated data of LSST, including deployment of experimental environment, data loading, design of database index, design of data partitioning, performance analysis.This thesis first analyzed the requirements of this project, such as choosing the dataset for this project, discussing common queries that the LSST project wants to support and choosing several queries for testing in this project. Then described parallel execution, index, bulk loading and Oracle partition table as the optimization methods used in this project to improve the query performance and data import process. At last, implemented Oracle RAC with two clusters on a super computer, imported the data of LSST to Oracle, tested queries that used in LSST, and eventually compared the results with several related research es.As a conclusion, the strategy of data partitioning and index used in this thesis has improved the performance of some query in LSST project.
Keywords/Search Tags:Oracle Real Application Cluster, database optimization, parallel querying, big data
PDF Full Text Request
Related items