Font Size: a A A

Research On Prototype And Attribute Collaborative Reduction Of Unsupervised Symbol Data

Posted on:2019-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:Q L ZhangFull Text:PDF
GTID:2428330626452094Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In machine learning,prototype selection(instance selection,sample selection)and attribute reduction(feature selection,attribute selection)have become an indispensable step in data mining.Attribute selection and sample selection are two important data preprocessing steps in machine learning.The former aims to remove some irrelevant or redundant features from a given data set,while the latter is to remove defective or duplicate records through certain indicators.This paper focuses on the collaborative selection of samples and features in an unsupervised environment.The main work and innovations are as follows:(1)An importance index based on the measurement of the amount of information carried in the data set attribute or instance is proposed.For unsupervised learning tasks,pseudo-labels can be generated by clustering and converted into supervised learning tasks.Using the relevant knowledge of fuzzy rough sets,the similarity between features is measured by distance,the correlation between attributes and the correlation quantification between samples are combined,and the existing information entropybased method is combined.Differently,this indicator achieves the purpose of reducing computational complexity by considering the potential of the relationship rather than the similarity.Meanwhile,the related properties of the proposed indicators are discussed and verified.The monotonicity of the indicators also ensures the effectiveness of the results of the selected learning tasks.Based on the proposed indicators,a greedy forward selection algorithm for feature selection is presented,and the effectiveness and practicability of the proposed indicators are verified by experiments.(2)A new algorithm is proposed,which combines spectral clustering with dictionary learning to achieve unsupervised feature selection.In dictionary learning,the intrinsic feature space of features and pseudo-tags is shared to ensure consistency of data distribution.The clustering structure is encoded into the dictionary to ensure that the priors of the data distribution are followed.Then,by calculating the projection matrix from the data matrix to the intrinsic feature space,the feature order of the original feature space is obtained.The 2 norm is used in the projection matrix to obtain the order of importance of the features,and the proposed model is optimized by the alternating minimization algorithm to achieve the purpose of feature selection.Aiming at the learning model proposed in this paper,a certain benchmark experiment is designed to verify its feasibility and effectiveness.(3)Based on the method proposed in the previous paper,the model is generalized to make it possible to solve the collaborative selection of samples and features.Features and samples are descriptions of different angles of the same data,and by adding the information of the samples to the proposed feature selection framework,it is possible to accomplish collaborative selection of features and samples.At the same time,through the different numerical conversion methods,the extended model is used to verify the performance of different conversion methods to convert symbolic data into numerical data under the premise of certain learning tasks.
Keywords/Search Tags:instance selection, feature selection, data dimensionality, unsupervised learning
PDF Full Text Request
Related items