Font Size: a A A

The Application Of High-dimensional Data Mining In Rough Classification Of Celestial Bodies

Posted on:2008-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y H SunFull Text:PDF
GTID:2178360212993462Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The spectra of celestial bodies contain important physical information of celestial bodies. Through researches on spectra, people can qualitatively or quantitatively measure the chemical components of celestial bodies, directly or indirectly confirm surface temperature, luminosity, diameter, mass of celestial bodies and do research on radial movement and self revolution of celestial bodies. Thus spectral analysis plays an important role in astrophysics. After the expected completion of the LAMOST project, large amount of spectra of celestial bodies will be collected in each observation night. How to deal with these voluminous spectra and obtain useful scientific information becomes an important research topic.Data mining technology has been widely applied in many fields. Data mining is a course of extracting cryptic, unknown but potential useful information and knowledge that embedded in abundant, incomplete, noisy, fuzzy and random data. By data mining technology, the functions of correlative prediction, classification, and clustering, isolated point discovering and time-series analysis can come true. At present, many mining algorithms with high-dimensional data become research hotspots. The spectra of celestial data are also high-dimensional. Thus, data mining technology can provide good support for the classification of spectral data and parameter measurement.According to the object of LAMOST, the classification of spectra data can be divided into two parts: rough classification and careful classification. The first step of rough classification is to divide spectra of celestial bodies into normal objects and emission-line objects. And then normal objects are divided into normal galaxies and stars, while emission-line objects are divided into starburst galaxies and Active Galactic Nuclei.The main jobs of this thesis aim at rough classification of celestial bodies and main points are summarized as follows:1)After doing research on covering algorithm, summarize the characters of the algorithm and put up corresponding improvements.The method is composed of the following two steps: the classification problem is first converted into a set covering problem, and then the classification is carried out by solving the support covering sets. The algorithm discussed in this thesis is based on the maximal distance between clusters. In the covering algorithm, iterative calculation is not required since it is a structural algorithm, and the discriminant function depends only on the support points of the covering sets.The thesis points out that according to the theory of covering algorithms, the number of covering data points will increase with the decrease of covering radius and the number of covering data points will decrease with the increase of covering radius. The covering radius should be set flexible and obtain the optimized value via repeated experiments. Thus the algorithm can get excellent results both on precision and processing speed. The thesis also points out that the results will be better if the distance between different data points is calculated using weighted eigenvector.2 ) Based on the characteristic of high-dimensional data and the general flow of data mining, the study built a stellar spectra classification model in high-dimensional data mining and analyzed the spectra features of quasar and late-type star with the model. These two categories of star-spectrum are classified in approach of statistic method .The proposed method is shown by extensive experiments to be prompt and high efficiency. In the course of system development, develop flat introduces .Net structure, and design language is C#. The system main contains such modules: Pre-processing, Line presentation and classification training etc.
Keywords/Search Tags:data mining, high-dimensional, classification, spectrum
PDF Full Text Request
Related items