Font Size: a A A

Data Mining Of LAMOST Emission Line Stars Spectra Based On Parallel Platform

Posted on:2017-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:G Z GeFull Text:PDF
GTID:2308330488953386Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Since the launch of the formal pilot survey, Guoshoujing Telescope (the Large Sky Area Multi-Object Fiber Spectroscopy Telescope, LAMOST) has produced tens of thousands of spectra with each observation. With the passage of time, LAMOST will produce massive spectra data. The goal of this paper is to identify emission line stars (ELS) from these massive spectra. Spectra with emission lines are usually special stars such as Cataclysmic Variable stars (CVs), Herbig Ae/Be etc. As the redshift of the stellar spectrum is relatively small, the positions of the spectral lines are always mostly fixed. Strong emission lines obviously in these positions of a stellar spectrum indicates that the stellar could be an emission line star. A star with emission lines may have experienced or be experiencing unstable ejection or accretion process, which plays an important role in the study of stellar evolution. However, the number is too small for ELS. Therefore, the search for ELS can only work in large sky survey projects like SDSS and LAMOST.Only distributed computing, parallel computing, large data processing techniques can effectively identify all the ELS from the massive spectra of LAMOST DR2. In this paper, the main work is divided into the following three parts:1. Design a parallel data preprocessing scheme with multi threads to extract the contents of all the FITS files into a target file, and ensure the accuracy of the extracted data. Then put the big target file into HDFS. which is preparing for the subsequent data mining experiments to identify ELS on Hadoop cluster.2. A new method named Mean Smooth Polynomial Fitting Method that combines the both advantages of polynomial fitting and median filtering is designed in this paper to improve the continuum fitting effect. Then some parameter adjustment experiments of line detection are done with sample data to improve the accuracy.3. According to the continuum fitting and parameter adjustment experiments of line detection, a map-reduce program has been designed with the parallel computing model MapReduce to do some test experiments on multi node cluster and pseudo-distributed cluster with sample data. Then some comparative analysis is done with the implementation efficiency of the test experiments. Finally, the data mining experiment for identifying all ELS has been carried out on Hadoop cluster by dealing with the big target file obtained from the first part.With the experiment on the Hadoop multi node cluster, this paper has identified 51,092 ELS candidates from LAMOST DR2. The experiment has effectively improved the efficiency to identify ELS, and the method proposed in this paper has important reference value to resolve similar massive spectra data processing problems in the future.
Keywords/Search Tags:Massive spectra, Emission Line Stars(ELS), Continuum fitting, Continuum normalization, Line detection, Hadoop cluster
PDF Full Text Request
Related items