Font Size: a A A

Research On Relational Classification Of Networked Data With Heterophily

Posted on:2020-06-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:S DongFull Text:PDF
GTID:1368330575978767Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The powerful heterogeneous database system and information system based on internet have produced massive networked data.The advancement of computer hardware provides a large amount of data collection and storage medium for the networked data.These technologies greatly promoted the development of networked data mining.Unlike traditional data which is independent and identically distributed,the networked data has much more complex structure which requires powerful tools to perform effective data analysis to extract valuable knowledge.Classification is one of the main tasks and challenges of networked data mining,and the effectiveness of classification often depends on the characteristics of data sets to be classified and the classification method according to which the classifier is based.Therefore,the research on the relational classifiers of networked data is of vital significance.In recent years,there have been many researches on the relational classification of networked data based on the assumption of homophily,but few on the relational classifiers of networked data with heterophily.The latter is more challenging.Networked data are exploited to model entities that are interconnected,and the potential relationships between entities can be used to help classifying.Many classification methods for networked data are based on the principle of homophily.Homophily is the tendency of interconnected entities to have the same categories.It is difficult to apply the homophily-based methods to the classification of heterophilous networked data with low homophily degree.In order to solve the above problems,this paper makes a deep research on the relational classification methods of heterophilous network data.The main research work is as follows:1.Make a thorough review of the relevant researches and theoretical methodsThis paper analyzes and summarizes the research work of networked data relational classification in recent years,gives an overview of its problems,application fields,some the main relational methods and collective inference methods of networked data classification,and introduces the network learning tool Net Kit-SRL system.2.For heterophilous networks,a relational neighbor algorithm based on class propagating distributions for classification in networked data with heterophily is proposedSince the homophily-based relational classifiers have a low level of performance for classification in networked data with heterophily,a relational neighbor algorithm based on class propagating distributions is proposed.Based on the idea of propagation,the labels of unlabeled nodes are influenced by the neighbor nodes' labels of their neighbors.The proposed method calculates the unlabeled nodes' propagating class vector and propagating reference vector by aggregation function and propagates the influence of the neighbor nodes' neighbors to the unlabeled nodes,and then gets the class distribution by comparing the similarity of the two vectors.Finally use collective inference to make this kind of influence in the network transmission.When the class distributions of the unlabeled nodes are stable,we get the final class distributions.Compared with two homophily-based relational classification methods and a heterophilous relational classification method,experimental results show that this method has better performance in heterophilous network classification.3.Proposed logistic regression classification in networked data with heterophily based on second-order Markov assumptionOwing to the particularity and importance of heterophilous network,the link-based classification method(NLB)is improved to make it suitable for heterophilous networked data classification.In this method,the second-order Markov assumption is introduced,and the link feature vectors of the nodes' second-order neighbors are used to model the structured logistic regression model.In the training process,two regularized logistic regression models are trained based on the link feature vectors of first-order neighbors and second-order neighbors of labeled nodes separately.During testing,two logistic regression models were used to predict the class distribution of unlabeled nodes by using the link feature vectors of the first-order and second-order neighbors of the unlabeled nodes as explanatory variables.After combining the above results,pick the label with the highest posterior probability.Updates the class distributions progressively using relaxation labeling collective inference method until converge or reach stable state.Compared with three homophily-based relational classification methods and two heterophilous relational classification methods,experimental results show that this method improves the accuracy of heterophilous network classification.4.Proposed a second-order Markov assumption based Bayes classifier for networked data with heterophilyFor traditional relational classifiers based on homophily cannot classify heterophilous networks correctly,an improved network Bayes classifier based on the second-order Markov assumption is proposed.First,the class distribution of unlabeled nodes is estimated according to the class distribution of neighbor nodes' neighbors.In this process,we perform this computation on the known and unknown neighbors separately.Second,we combine the two parts using multinomial naive Bayes classification.At the same time,we update the class distributions in each iteration with the method of relaxation labeling collective inference(which imports simulated annealing).The method is compared with the homophily-based methods and heterophily-based methods respectively,the experimental results show that this method has better performance when the networks are heterophilous.
Keywords/Search Tags:Artificial intelligence, relational learning, networked data classifier, classification in networked data with heterophily, collective inference
PDF Full Text Request
Related items