Font Size: a A A

Research On The Storage And Processing Of Massive Radio Astronomy Observation Data

Posted on:2020-01-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:C M ShiFull Text:PDF
GTID:1360330623457761Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology,the degree of information of human society is constantly improving,which brings human society into the era of big data.The profound influence and great value brought by big data are gradually recognized by human society.In other words,big data not only changes the way people live,work and think in all directions,but also brings significant opportunities to scientific research.Information management and related disciplines are the fundamental core of the development of big data technology,which focus on and solve a series of key issues in data collection,transmission,storage(archive),retrieval,processing,analysis and mining,publishing and application in the era of big data.In recent decades,with the emergence of a new generation of astronomical telescopes,astronomy has entered the era of big data,and the data obtained by astronomical observations has become one of the largest data sources in human society.The astronomy and information disciplines continue to integrate and promote each other,and gradually develop into an emerging frontier interdisciplinary-astroinformatics.Based on the knowledge of information management and related disciplines,this dissertation focuses on key technologies in storage,retrieval,transmission and archiving based on the key issues in current astronomical massive data management.Finally,the data management of two radio telescopes(MingantU SpEctral Radioheliograph,MUSER and Square Kilometer Array,SKA)are used as use cases to verify the correctness and effectiveness of the related work through data simulation,instantiation test,performance comparison and theoretical analysis.The details are as follows:1)For the efficient storage and retrieval of massive radio astronomical observation data records,based on the data characteristics of observation data(time-series data)with fixed sampling interval and a fixed number of consecutive time-series observation data records stored in file in chronological order,a negative database system oriented to timeseries data with the theory of complement theory as the core is proposed.In other words,the negative database system regards the metadata information of the records in the file and the lost records between the first and last records as a complete set,and regards the metadata information of the lost records between the first and last records in the file as a complement set,constructs the logical structure relationship of the data file with the complement set.And the constructed logical structure relationship of the time-series observation data file can be used to derive the metadata information corresponding to the records recorded in the file.Meanwhile,this dissertation gives a complete formal definition and a rigorous theoretical proof about the negative database for time-series data.A large number of experimental results show that the negative database system is about 18.8 times faster,1.5-6.9 times faster,and lowers !" times than the data management system based on a common data management method that needs to store the metadata information of all records recorded in the time-series data files in terms of record warehousing,retrieval,and number of records to be warehousing(N refers to the number of records in the file).That is to say,the negative database can provide fast retrieval function while greatly reducing storage overhead.2)Aiming at the high-speed transmission demand of massive radio astronomical observation data across regions,this dissertation proposes a two-way asynchronous messaging transmission model with status detection and retransmission function,namely messaging transmission model.It is that two asynchronous message transmission is used to transmit data messages and feedback messages in a unidirectional high-speed manner respectively,ensure that data is delivered to the recipient by timeout retransmission,and whether to continue to send a message to the receiver by real-time status detection.And,it can overcome the shortcomings of error retransmission method used by many current remote data transmission technologies,which needs to wait for the peer feedback message to reduce the efficiency of data transmission.Based on the proposed messaging transmission model,an efficient data transmission system is implemented.A large number of experimental results show that the average transmission speed of the new system is approximately 40 times faster than the existing system in astronomy when the file size is hundreds of kilobytes;at the same time,the new system can achieve an average transmission speed of 1172MB/s in the order of hundreds of megabytes and a small number of concurrent numbers,which is nearly 3.4 times faster than existing systems and basically achieves a full load of 10 Gb/s network bandwidth.Furthermore,the efficient data transmission system effectively improves the transmission performance and shortens the data transmission time.3)Aiming at the need of massive radio astronomy observation data to reduce data redundancy when carrying out high reliability archiving,this dissertation proposes an archival model based on erasure coding.It refers to an archival model formed by integrating erasure coding technology into the data message receiver in the two-way asynchronous messaging transmission model with status detection and retransmission function.It can overcome the shortcoming of the high data redundancy caused by the use of three replicas in the existing data archival systems in the astronomical field.Based on the proposed archival model and the Reed-Solomon algorithm with parameters 4 and 2,the archival system is implemented.A large number of experimental results show that the average off-site archiving speed obtained by the implemented archival system in the same experimental environment is 1.4 times that of the existing system without enabling 3 replicas policy.Meanwhile,only 50% additional storage overhead of the implemented archival system is needed to achieve the data reliability that can be achieved based on the 200% additional storage overhead required by the 3 replicas policy model,and the concurrency and the high water mark are key parameters for the implemented archival system tuning.Furthermore,the archival system based on the proposed low redundancy archive model has higher archiving speed and can obtain higher data reliability with lower additional storage overhead.In summary,this dissertation is based on the interdisciplinary,facing the needs of astronomical data management,applying information subject knowledge to solve the problems of efficient storage and retrieval,high-speed data transmission and archiving in astronomical massive data management.Some key issues in astronomical massive data management have been partially solved,which is beneficial to improve the overall function of astronomical massive data management to a certain extent.The research results also provide reference for other application fields with similar data management requirements,and have certain theoretical value and engineering application value.
Keywords/Search Tags:Time-series data, Negative database, Remote archive, Data redundancy
PDF Full Text Request
Related items