Font Size: a A A

Research On Improved Inverted Specific-Class Distance Measures And Their Application

Posted on:2022-08-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:F GongFull Text:PDF
GTID:1488306740499754Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Classification is one of the most important tasks in data mining and is widely used in medical diagnosis,text classification,pattern recognition,and location recommendation.There are many machine learning algorithms that achieve classification,such as instance-based learning,decision tree,Bayesian network,artificial neural network,support vector machine,self-organizing mapping etc.Among them,the instance-based learning is a most prevalent classifier and its classification performance mainly depend upon the distance metrics,which serve to estimate the similarity between two instances by superimposing the difference in attribute values for each attribute.Attributes can be divided into numerical and nominal ones.The discontinuity of nominal attributes makes their distance cannot be obtained by directly estimation of the difference between two attribute values.Thus,it is more difficult and challenging to estimate the distance of nominal attributes than numerical ones.Researchers have transformed the similarity estimation of discontinuous nominal attributes into the distance measure of the probability of prior knowledge statistics and posteriori knowledge estimation,and successively proposed the overlapping metric(OM),the value difference metric(VDM),the Short and Fukunage metric(SFM),the minimum risk metric(MRM),the inverted specific-class distance measure(ISCDM)etc.Among them,the ISCDM obtains the similarity between two instances by superimposing the inverted specific-class conditional probability of each attribute.It has good robustness to datasets with missing values and non-class attribute noises.Therefore,it is among one of the top performing distance metrics that deal solely with nominal attributes.However,the same as the most distance metrics that address nominal attributes,the ISCDM exists a well-known attribute independence assumption,which assumes that there is no interdependence between any two attributes.This strong attribute independence assumption is almost untenable in many real-world datasets.This thesis takes the ISCDM as a basic research object.To relax its attribute independence assumption,this thesis improves it from the following two aspects: 1)structure extension,which represents the interdependence relationship between attributes by adding directed edge between them;2)attribute weighting,which assigns different attribute weights to different attributes to distinguish their contributions to the classification results.In this thesis,four improved algorithms are proposed,which are the averaged one-dependence inverted specific-class distance measure,the gain ratio weighted inverted specific-class distance measure,the differential evolution for an attribute weighted inverted specific-class distance measure,and the fine-grained attribute weighted inverted specific-class distance measure.This thesis also investigates the existing users' trajectory semantic enrichment in personalized location recommendation system and validates the practical application value of the four new distance measures.The main research work and achievements of this thesis can be summarized as follows:(1)According to the attribute independence assumption and structure extension of na(?)ve Bayesian classifier,the attribute interdependence relationship has been introduced into the inverted specific-class distance measure so as to alleviate its attribute independence assumption.Moreover,an averaged one-dependence inverted specific-class distance measure(AODISCDM)has been proposed in accordance with the attribute dependencies learned by the structure extended na(?)ve Bayesian classifier.Extensive experiments show that the performance of the proposed AODISCDM is much better than that of the original ISCDM on dataset with strong attribute dependencies.(2)The performance impact of the curse of dimensionality on distance metrics are discussed.Through the investigation on the existing attribute weighting schemes,it is proposed to introduce the attribute weighting scheme into the ISCDM so as to emphasize the attributes that are more related to class variables and weaken the redundant ones.In addition,the gain ration of each attribute is calculated and assigned as attribute weights to the corresponding attributes,and a gain ratio weighted inverted specific-class distance measure(GRWISCDM)is proposed.Extensive experiments show that the proposed GRWISCDM greatly improves the performance of the original ISCDM,and meanwhile maintains its simplicity.(3)The advantages and disadvantages of filtering and wrapping attribute weighting schemes are compared.Through the research on the existing wrapping attribute weighting schemes,it is proposed to apply heuristic to find the attribute weighted ISCDM(AWISCDM)so as to find better attribute weights.Besides,combined with the characteristics of the different evolution algorithm that it can always jump out of the local optimization to obtain global optimization without assuming any initial values in advance,an attribute weighted specific-class distance measure using different evolution(DE-AWISCDM)is proposed.Extensive experiments show that the global optimal attribute weighting scheme can extremely improve the performance of AWISCDM.At the same time,when performance is the first consideration,the proposed DE-AWISCDM is a desirable distance measure paradigm.(4)Inspired by the successful of the AWISCDM,an innovative idea that subdividing attribute weights into fine granularity is generated.Through the research on the existing fine-grained attribute weighting schemes,a suitable fine-grained attribute weighting scheme has been designed for ISCDM so as to emphasize the attribute values that are more related to the class variables and weaken the redundant ones.Furthermore,the random restart walk algorithm is used to optimize the attribute weight after subdivision,and a fine-grained attribute weighted inverted specific-class distance measure(FAWISCDM)is proposed.Extensive experiments show that the performance of the proposed FAWISCDM is much better than that of the original ISCDM.At the same time,the time efficiency of the ISCDM is maintained.(5)The existing semantic enrichment of users' trajectory in personalized location recommendation system are analyzed.The practical application values of the new algorithms(i.e.,AODISCDM,GRWISCDM,DE-AWISCDM and FAWISCDM)in semantic enrichment of users' trajectory are also discussed.
Keywords/Search Tags:Distance metrics, Nominal attributes, Inverted specific-class, Structure extension, Attribute weighting
PDF Full Text Request
Related items