Font Size: a A A

Research For Pulsar Candidates Classification Algorithms Based On Machine Learning

Posted on:2021-03-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y C WangFull Text:PDF
GTID:1360330605474742Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Pulsar search is an important frontier in radio astronomy.The performance of search facilities has realized dramatic improvement,especially in resolution and sensitivity,so these facilities can receive weaker pulsar signals,but also more interference signals,including radio frequency interference(RFI)and noise.As a result,the number of pulsars only accounts for a tiny proportion of plentiful candidates received by facilities.Besides,some RFIs that resemble pulsars also increases the difficulty of classification.Therefore,identifying pulsar signals accurately from massive signals becomes an urgent problem to be solved in this field.Focusing on using machine learning methods to solve the problem of pulsar candidates classification,this article explores the application of supervised learning,semisupervised learning,and unsupervised learning methods to this problem,considering different application scenarios or needs.Firstly,given the problems that the pulsar candidates are imbalanced and lacking analysis and optimizing on artificial features designed by experts,a hybrid ensemble learning method for imbalanced pulsar classification is proposed.In this work,tree models are used to analyze their relative importance and select features to optimize the feature set.For the extremely imbalanced situation,based on the Easy Ensemble,the imbalanced dataset is divided into several relatively balanced sub-datasets.At the same time,XGBoost and random forest are used as the base classifier trained on each subdataset with cost-sensitive learning to build the hybrid ensemble learning method,which can improve classification performance based on artificial features.On the HTRU(High Time Resolution Universe survey)1 dataset,the recall and the precision are 0.967 and0.971,respectively,which are 0.4 % and 0.6 % higher than that of DCGAN-SVM model.On the HTRU 2 dataset,it achieves a recall of 0.920 and a precision of 0.917,and its F-score is 0.918,which is 4.4 % higher than that of the PNCN model.Secondly,to avoid the bias of artificial features,a convolutional neural network model is designed for pulsar candidates classification to realize end-to-end processing.The raw data of sub-integrations plot and sub-bands plot in each pulsar candidate are used as the inputs and are processed by multi-layer convolutional networks to extract features automatically,so that the outputs are classified results.Besides,for the imbalanced problem,considering the characteristics of pulsars,a normalized linear combination method is proposed to effectively expand the distribution of training pulsar samples and meet the model's needs for pulsars,which contributes to reducing the generalization error of this model.On the HTRU 1 dataset,the recall is 0.962 and the precision is0.963.The F-score is 0.962,which is 1% higher than that of other convolutional neural network methods on this dataset.Then,this paper discusses using anomaly detection to solve the pulsar classification problem due to the lack of pulsars and the needs of unknown pulsar data mining.By taking abundant RFIs or noise data as normal samples,but rare pulsars or unknown data as abnormal samples,an anomaly detection model based on the isolated forest algorithm is designed.When the model is trained with non-pulsars of HTRU 1 dataset,the testing results show that the pulsar's recall is 0.978 and the false positive rate is 0.05.When increasing the threshold,its false positive rate is 0.05 while recall is 0.991.Finally,for the problem of lack of labeled samples,deep embedding clustering is applied to unsupervised clustering analysis for pulsar candidates.Using the subintegrations plot and sub-bands plot of candidates as inputs,the algorithm realizes the feature learning and clustering end-to-end by combining the convolutional autoencoder model and K-means clustering layer,which is optimized by using the reconstruction loss and the clustering loss based on KL divergence together.In the absence of labeled data,the recall and the false positive rate are 0.96 and 0.046,respectively,when the ratio of positive to negative samples is 1:7.5.And they are 0.95 and 0.048,respectively,when the ratio is 1:22.5.The algorithm is suitable for the initial classification for unlabeled samples and is stable to the imbalanced situations.
Keywords/Search Tags:Pulsar candidates classification, Machine learning, Ensemble learning, Convolutional neural network, Anomaly detection, Deep clustering
PDF Full Text Request
Related items