Font Size: a A A

Representation And Classification Of Interval-valued Data

Posted on:2022-07-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:X B QiFull Text:PDF
GTID:1488306509466424Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Interval-valued data is a kind of common quantitative symbolic data.It can grasp the internal structure characteristics of data objects from the overall situation,and has important scientific significance for revealing the uncertainty law hidden in the data.However,the particularity of interval-valued data structure makes it impossible for the computer to process it directly,so the premise of interval-valued data analysis is to carry out reasonable numerical representation.When the interval-valued data representation method proposed by the researcher is transformed into a numerical representation,the midpoint of interval-valued data or the upper and lower bounds of interval-valued data are often used.This numerical representation only considers location information or only size information,and lacks the overall structural characteristic information of interval-valued data.In addition,these representation methods also do not consider the internal distribution of interval-valued data eigenvalues,resulting in the inaccurate structural information of interval-valued data,which in turn affects the analysis results of interval-valued data.Therefore,the representation methods of interval-valued data remain to be further studied.This thesis systematically and deeply studies the representation methods of interval-valued data from two cases of uniform distribution and non-uniform distribution.Then the proposed representation frames are applied to the classification task of interval-valued data.The main research works of this thesis are as follows.(1)A representation and classification method for interval-valued data based on unified frame is proposed,namely URF?SU.This method constructs a unified representation frame of interval-valued data that integrates location information and size information.The unified representation frame balances the relationship between midpoint and radius by adjustment factor,and can include the existing representation methods.For feature selection in classification task,after the unified representation frame is stable,the symmetrical uncertainty is adopted to quantify the correlation degree between each dimension feature and category.The features are arranged in descending order according to the quantized correlation degree,and the features with high correlation degree are selected to update the feature subset iteratively.Since the interval-valued data based on unified representation frame has relatively complete structural information and the features are selected,the proposed URF?SU can improve the classification performance of the model.(2)For the problem of missing attribute values of interval-valued data,a classification method for incomplete interval-valued data based on unified representation frame is proposed,namely RKNN.This method firstly designs a combining rule for incomplete interval-valued data according to the characteristics of the interval-valued data.The rule does not need to set the percentage of missing entries in advance,but automatically judges the filling or ignoring of the samples with missing values.Then,the samples with at least one missing value in each feature are ignored,and the remaining samples are filled with nearest neighbors in the complete interval-valued sample set of the same category.Finally,the filled samples are added to the complete interval-valued sample set,which provides reliable filling guarantee for the subsequent samples with missing values.The proposed RKNN has a high filling rate and positive filling effect,and the classification performance is also very good.Furthermore,this method can be used to solve the missing attribute values on high-dimensional and special distributed datasets.(3)For the internal imbalanced interval-valued data with obvious aggregation,a representation and classification method for interval-valued data based on sample space is proposed,namely AGURF.Because the structural information of internal imbalanced interval-valued data has changed,firstly three forms of internal imbalance are defined.Then by remeasuring the internal structure of interval-valued data from the perspective of sample space,an adaptive general unified representation frame based on sample space is constructed.This method clusters the samples of the same category.In each cluster,the offset-center of one sample is determined by offset direction and offset distance.Meanwhile,an adaptive factor is automatically set for different categories of samples to balance the relationship between offset-center and radius.Finally,the internal imbalanced interval-valued data is classified based on this frame.AGURF has good running efficiency while maintaining high classification accuracy.(4)For the internal imbalanced interval-valued data with high dimension or no obvious aggregation,a representation and classification method for interval-valued data based on feature space is proposed,namely FAGURF.This method constructs an adaptive general unified representation frame based on feature space.It clusters each dimension feature of the same category of samples.In different clusters of each dimension feature,by obtaining the offset direction and offset distance of each sample,its feature offset-center is determined.Moreover,a set of feature adaptive factors are automatically set for different features of the same category of samples to avoid the errors caused by excessive differences between features.This frame is also used for the classification of internal imbalanced interval-valued data.FAGURF has a wider range of applications than AGURF on the basis of good classification accuracy and running efficiency.This thesis proposes corresponding representation frames of intervalvalued data from two cases of uniform distribution and non-uniform distribution,and applies them to the classification task,which improves the classification performance and efficiency and enriches the theory and application scope of interval-valued data.Moreover,this thesis provides a new research idea for the analysis of interval-valued data,and has a certain theoretical significance and application value in decision support.
Keywords/Search Tags:Interval-valued data, Location information, Size information, Incomplete interval-valued data, Internal imbalance
PDF Full Text Request
Related items