Research On The Similarity Measure Of Discrete Data Based On Conditional Probability

Posted on:2019-01-08

Degree:Master

Type:Thesis

Country:China

Candidate:B R Zheng

Full Text:PDF

GTID:2428330566986427

Subject:Computational Mathematics

Abstract/Summary:

The distance or similarity of two instances plays an important role in data mining and machine learning.It is widely used in machine learning algorithms such as classification,clustering,anomaly detection,feature selection,and instance retrieval.The distance of continuous data is very mature,and the similarity of discrete data has significant research significance.Many data-driven similarity measurement methods use the data set to obtain the distribution of attribute values and construct measure functions from the perspective of frequency,probability,and information entropy.Taking into account the class information of the discrete data with class labels has a guiding role in the training of the learner.In this paper,the similarity measure function is constructed by using the class conditional probability of the attribute value,and discussed separately on the unordered and ordered discrete attributes.The main research contents are as follows:(1)Propose the similarity measure of disordered discrete attribute based on conditional probability.This measurement method uses the conditional probability of the attribute value combined with the information entropy theory,and takes the ratio of the common information of the two instance objects to the total information amount of the two instances as its similarity.Applying it to multiple data sets,the experimental results show that the learner under this measure has lower error rate.(2)Propose the similarity measure of ordered discrete attributes based on conditional probability.For the order relationship of the attribute values,the measure makes the similarity of neighboring values of the order relationship larger;on the contrary,the similarity of the value of the farther order relationship is smaller.Combining it with the measurement method proposed in(1),and applied to multiple data sets mixed with ordered and unordered discrete attributes.The experimental results show that it has better performance.(3)Apply the similiarity measures proposed in this paper to data attributes that include orderly and unordered microcredit user application qualification data set,and compare it with other commonly used similarity measures under the experimental test results of the data set.The experimental results show that the measure methods presented in this paper perform better on various performance evaluation indicators,which indicates that it has a certain of effectiveness.

Keywords/Search Tags:

Related items

1	Multi-granulation Rough Sets And Granular Reductions Based On Similarity Measure
2	Extremal Entropy: Information Geometry, Numerical Entropy Mapping, and Machine Learning Application of Associated Conditional Independences
3	Image Segmentation Method Based On The Measure Of Fuzziness
4	Attribute Reduction Of Rough Set Based On Conditional Information Entropy In General Binary Relation
5	Research On Reversible Data Hiding Based On Entropy Sorting And Structural Similarity Constraint
6	Research On Collaborative Filtering Recommendation Algorithm Based On Information Entropy
7	Research On Radar Target Data Association And Tracking Algorithm Based On Information Entropy
8	Matrix Entropy Reductions Of Decision Formal Context
9	Studies On Clustering Algorithms For Categorical Data
10	Research On User Similarity Measure And Recommendation Service Algorithm Based On Data Enhancement