Font Size: a A A

The Research On The Star/galaxy Classification Based On Ensemble Learning

Posted on:2021-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:C LiFull Text:PDF
GTID:2480306458469094Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
More and more countries have begun large-scale sky survey projects because astronomy research has become an important standard that reflects the strength of the country's comprehensive national strength.Among many sky survey missions,the stars/galaxies classification has always been an important goal of astronomy research.Previously widely used to solve stars/galaxies classification problems are primitive methods based on graphic states and heuristic segmentation.In recent years,with the slowness of the original method to solve the stars/galaxies classification problem,the disadvantages of low classification accuracy have been highlighted.Excellent models and algorithms based on machine learning have also been developed,but the prediction effect of machine learning often depends on specific problems.Ensemble learning predicts the final result by comprehensively considering several base classifiers.Therefore,its ability to adapt to various scenarios is strong and the classification accuracy is high.For this reason,this paper studies the algorithm model based on ensemble learning and uses it to solve the difficult problem of stars/galaxies classification of complex astronomical data.The experimental results show that the integrated learning algorithm model comprehensively considers the advantages of multiple models,and has a strong learning ability in solving the problem of low accuracy of stars/galaxies classification of astronomical data.Therefore,the algorithm model based on ensemble learning is superior to traditional data mining classification algorithms in solving the stars/galaxies classification problem,and thus obtains better classification results.The main content of the paper is as follows:(1)Introduction and preparation of theoretical knowledge of machine learning algorithms.In order to solve the problem of star / galaxy classification in astronomical data mining,This article first gives a brief introduction to the development and theoretical knowledge of machine learning;Then,the basic algorithms such as decision tree and support vector machine in machine learning are discussed in detail;Finally,the algorithm principles of Bagging,Boosting,and Stacking in integrated learning are mainly studied,and algorithms such as Random Forest,Adaboost,Gradient Boosting Decision Tree,and XGBoost are discussed separately for different integration ideas.(2)The XGBoost ensemble algorithm model was constructed and simulated based on Sloan digital sky survey photometric data.After discussing the basic idea of the Boosting algorithm in ensemble learning,the emphasis is on the strong learning algorithms such as GBDT and XGBoost,which are improved and extended by the Adaboost algorithm.At the same time,considering the problem of low accuracy of stars/galaxies classification caused by the lack of data volume and large noise of the dark source magnitude sets and the darkest source magnitude sets in Sloan photometric data,the XGBoost algorithm model was introduced.In the experimental simulation,the full SDSS-DR7 metering data were used,which include the bright source magnitude sets,dark source magnitude sets and darkest source magnitude sets.First,dividing the data using the ten-fold cross-validation method for the bright source magnitude sets,dark source magnitude sets and darkest source magnitude sets;then,using the divided data to train the XGBoost model;Finally,comparing the prediction results of XGBoost model on the test set with the experimental results of the function tree algorithm in the literature,it was found that the classification accuracy of XGBoost algorithm on the dark source magnitude set and the darkest source magnitude set was improved by about 11% and 5% respectively.(3)Design and experimental simulation of stars/galaxies darkest source magnitude set classification algorithm based on Stacking Ensemble learning.Aiming at the problem of low classification accuracy of the darkest source magnitude sets in the Sloan Digital Sky Survey,this paper builds a stars/galaxies classification algorithm based on Stacking ensemble learning.In algorithm design,support vector machine,random forest and XGBoost algorithms are used to establish the base classifier model;the gradient boosting decision tree algorithm is used as the meta classifier model,and then a two-layer Stacking ensemble learning model is constructed.In experimental simulations,First,use 10-fold nested cross-validation to divide the data for the darkest source magnitude sets.Then,use the divided data to train the Stacking ensemble learning model.Finally,the experimental results show that the Stacking ensemble learning model improves the classification accuracy of stars/galaxies in the darkest source magnitude set by 10% compared with the function tree algorithm in the literature and it also improves significantly compared with other traditional machine learning algorithms,lifting algorithms and deep learning algorithms.
Keywords/Search Tags:machine learning, ensemble learning, stacking, star/ galaxy classification
PDF Full Text Request
Related items