Font Size: a A A

Research On Multi-relational Clustering Analysis Approaches

Posted on:2009-10-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y GaoFull Text:PDF
GTID:1118360272976546Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of informationa technologies, data is generated all the time, and there will be a great lot of data in any area. We can get many interested, hidden and useful information from the data using data mining technology. Traditional data mining tasks such as sociation rule discovery, market baskets analysis, clustering analysis, commonly attempt to find patterns in a dataset characterized by a collection of independent objects of a single type. However, there are many multi-type relational datasets, the objects in which are multi-type and related. Multi-relational data mining (MRDM) is the process of discovering meaningful new correlations, patterns and trends in multi-type relational data by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques.MRDM is one of hotspots in artificial intelligent fields. It has received a lot of attention and a rich variety of approaches have been developed by many researchers such as approaches based on converting multi-relations to single relation, Bayesian networks, stochastic grammars, Markov networks, and (hidden) Markov models. Many research tasks of MRDM have been conformed, and they are collective classification, link prediction, link-based clustering, and object identification, etc. MRDM can be applied in a wide range of application areas such as bio-informatics, bibliographic citations, social networks, financial analysis, the Internet, etc.Based on the analysis of related research and existing methods of MRDM, we put forward several clustering algorithms for multi-relational data, and apply them in the network education resource management system (NERMS). The main contribution and results of this thesis are listed as follows:(1) Provide an introduction and overview on MRDM.This thesis introduces and summarizes the research field and different tasks of MRDM, and provides an introductory survey and overview of the MRDM approaches. The current problems in MRDM are discussed and future research directions are pointed out.(2) Analyze the related theories and methods.The thesis has analyzed the related theories and methods, which include clustering analysis, relational and multi-relational clustering analysis, semi-supervised learning, and personalized recommendation. In each part, we introduced and summarized the the states-of-the-art of each technology.(3) Have a research on feature weighted clustering algorithm, and present a model of feature weighted clustering algorithm.Based on the existing methods of clustering analysis and learning feature weights, to consider the particular contributions of different features and apply supervised feature ranking methods to unsupervised classification, a model of feature weighted clustering algorithm is proposed, which executes a clustering algorithm firstly, and then according to the results of clustering, learns feature weights using supervised feature ranking methods, and according the new feature weights executes the clustering algorithm again, this procedure iterates until convergence or maximum iteration times. Distance-based and density-based clustering algorithms in Euclidean space can be used in this model. Based on this model, fuzzy C-means clustering (FCM) and density-based spatial clustering of applications with noise (DBSCAN), and information gained and reliefF feature ranking algorithms are used to the experiments on several UCI machine learning databases, and validate the effectiveness of the model.(4) Have a research on clustering for multi-relational data, and present a two-stage clustering algorithm for multi-type relational data.To address the efficiency and scalability problems of clustering methods for multi-type relational datasets, a two-stage clustering algorithm for multi-type relational data (TSMRC) has been proposed. In TSMRC, we analyze all kinds of relationships of data firstly, which include explicit relationships and implicit relationships, and classify them into intra-relationships and inter-relationships; and then in the first stage, cluster each type of objects individually, and during this process, both attributes and intra-relationships are considered, any relational clustering method can be used here; in the second stage, regard each cluster in the results of the first stage as a new object, and merge interrelated clusters of different types according to inter-relationships, so as to cluster multi-type objects. During the above process, new similarity measures are proposed. Experimental results on Movie dataset demonstrate the accuracy and efficiency of this algorithm.(5) Have a research on semi-supervised clustering for multi-relational data, and present a semi- supervised k-means clustering algorithm for multi-type relational data.A semi-supervised k-means clustering algorithm for multi-type relational data is proposed, which extends traditional k-means clustering by new methods of selecting initial cluster centers and similarity measure, so that it can semi-supervised cluster multi-type relational data. In order to achieve high performance, in the algorithm, besides attribute information, both labeled data and relationship information are employed. The experimental results on Movie database show the effectiveness of this method.(6) Have a research on collaborative filtering recommendation based on clustering, and present a collaborative filtering recommendation algorithm combining Probabilistic Relational Models and user grade.To address the sparsity and scalability problems of collaborative filtering, a collaborative filtering recommendation algorithm combining Probabilistic Relational Models and user grade (PRM-UG-CF) is presented. PRM-UG-CF has primary two parts. First, a user grade function is defined, and user grade based collaborative filtering method is used, which can find neighbors for the target user only in his near grade, and the number of candidate neighbors can be controlled by a parameter, so recommendation efficiency is increased and it solves the scalability problem. Second, in order to use various kinds of information for recommendation, user grade based collaborative filtering method is combined with Probabilistic Relational Models (PRM), thus it can integrate user information, item information and user-item rating data, and use adaptive strategies for different grade users, so recommendation quality is improved and it solves the sparsity problem. The experimental results on MovieLens data set show that the algorithm PRM-UG-CF has higher recommendation quality than a pure PRM-based or a pure collaborative filtering approach, and it also has much higher recommendation efficiency than a pure collaborative filtering approach.(7) Apply the methods of clustering on NERMS.NERMS serves teaching and scientific research of computer science in college, and it includes collecting, organizing, pushing and managing resources about network educational information, and realizes sharing of all these resources. Many data mining methods are used in the system. This thesis introduces the system and how the above methods are applied in the system.The research results of the thesis will greatly enrich and push the studies of the MRDM in both theoretical and technological aspects.
Keywords/Search Tags:Multi-relational data mining, clustering analysis for multi-relational data, feature weighted, semi-supervised learning, personality recommendation, network educational resources management
PDF Full Text Request
Related items