Font Size: a A A

Research On Classification And Link Prediction Of Network Data Based On Links

Posted on:2013-01-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:L N LiFull Text:PDF
GTID:1118330371982837Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of information technology represented byInternet, human society has entered a network age. The demand for network analysis has keptrising, and network data mining has become a new important research field in data mining,and has been widely applied in numerous domain including document classification, proteinstructure prediction, natural language processing, social network analysis and so on. Networkdata mining aims at extract implicit knowledge from network data source, and performlearning task such as entity classification, link prediction, community discovery, entityranking, and network clustering task, so as to reach the purpose of analysis the nature,function and dynamic change of network, as well as understanding the relationships betweennetworks. As key parts of network data mining, Entity classification and link prediction hasattracted particular attention by researchers and a great deal of work has been done. However,the accuracy of algorithm still needs to be enhanced. Besides, there is little work on sparselabeled and sparse linked network. The thesis selects these problems as its main topic.The thesis analyses current entity classification methods as well as the premisehypothesis, suitable network data type, and applied domains of these algorithms in detail onthe first. And then especially studies the role of links on entity classification task, a series ofsolution has been presented to solve entity classification on different types of network data. Inparticular, focus on network whose entities have attribute information, presents newregularization classification model based on principle component analysis and extremelearning machine; focus on network whose entities have attribute information, but few entitiesare labeled in the network, presents a new active collective classification method by combingfeature selection and link filter; focus on network whose entity have only label information,presents an integrate collective classification framework to deal sparse labeled network.Besides, the thesis summarizes the study situation on link prediction, and presents collectivelink prediction framework to deal sparse linked network.The detail research results are as follows:1. Make a thorough review of research on entity classification and link predictionThe thesis introduces and summarizes the research tasks of entity classification and linkprediction, and points out problems in current approaches and future research directions.2. Focus on network whose entities have attribute information, presents newregularization classification model based on principle component analysis andextreme learning machine. The algorithm improves current regularization method in that it not only contains thesmooth constraints of defined function, but also considers the label distribution in thenetwork. It adds two new regularization items, they are respectively intra-class similarregularization item and inter-class different regularization item. In realization, we extendextreme learning machine so that it can be used for semi-supervised problem, and furtherinduce the weight definition of hidden layer, in order that it fits the new function. Experimentresults show that for the case that the ratio of labeled nodes is more than25%, our methodperformed well.3. Focus on network whose entities have attribute information, but few entities arelabeled in the network, presents a new active collective classification method bycombing feature selection and link filter.To improve the classification accuracies of collective classification methods, we advancethem so that attribute information and link information can be combined the performclassification during the collective inference procedure. This algorithm first uses featureselection to find important features and then constructs links according to attribute similarity;then it analyses original links in network, and selects useful links; finally algorithm combinestwo kinds of links to collective classify nodes. Experiments show that our method can handlesparse problem very well.4. Focus on network whose entities have only label information, presents a collectiveclassification framework to deal sparse labeled network.The framework divides the attributes of node into two categories, that is structureattribute and label attribute. Algorithm uses different attributes in different stage, andintegrates them to perform classification together. Based on this framework, we present a newclassifier which is called Laplacian classifier based on the structure attributes of nodes, andalso present a new classifier based on label distribution, which is named link pattern classifier.We test our approach in comparison with typical collective classification methods, and theresults indicate that our method can perform well than other methods.5. Presents collective link prediction framework to deal sparse linked networkWe proposed a collective link prediction framework, which aims at predicting relatedlinks simultaneously, so that it can deal with sparse linked network as well as network whoselinks are dependent with each other. Based on this framework, two new link predictionmethods are presented; they are separately collective resource allocation and collectiverandom walk. We test our methods on several networks, and results indicate that our methodscan obtain higher prediction accuracy, especially for sparse linked case.Nowadays, network mining has been interested by many researchers. This thesis studiesentity classification and link prediction problem in network mining, and presents effectivelearning algorithms for different data types. It is of both theoretical and practical significanceof the research on classification and link prediction problem in network data.
Keywords/Search Tags:Collective classification, link prediction, extreme learning machine, active learning
PDF Full Text Request
Related items