Font Size: a A A

Research On Entity Clustering And Clusters Fusion Oriented By Multi-Domains Community Detection

Posted on:2016-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:H B XuFull Text:PDF
GTID:2428330542489389Subject:Computer systems and structure
Abstract/Summary:PDF Full Text Request
With the innovation and development of Internet,a large number of social network sites and content sharing platforms are appearing.Users can take full advantage of those social network sites and content sharing platforms to build their own social relations and share resources with others.The users,social relations and resources constitute the social networks which are formalized by graphs.With the development of Web2.0 technology,the Internet is moving in the form of community and the users hope to obtain more information about community through participation and interaction.Therefore,it is necessary to distinguish several dense subgraphs from social networks,also namely community detection.Community detection is a research hotspot in data mining,machine learning and other fields.Via community detection,the entities can be divided into different communities according to their similarities.The similar entities are assigned to the same community,but the different ones are assigned to the different communities.The technology of community detection can be used in crime detection,protein function prediction,Web community discovery,document clustering,and so on.However,traditional methods of community detection often focus on a single domain.They only consider limited influence factors and lack the collaborative promotion among different domains,which often lead to dumb results.For many practical applications,the graph information representing the community structure often comes from different domains or views,so,the task of community detection must consider more factors.In addition,the exiting methods of cross-view or cross-domain community detection require every domain to meet certain restrictions,ignore the credibility of entity belonging to a cluster and can not fuse multiple clustering results.Therefore,the thesis presents the model and algorithms of entity clustering and clusters fusion oriented by multi-domains community detection.And some strategies are proposed to improve the algorithms from different aspects.Our major work and contributions are as follows:(1)We briefly summarize the related work about community detection and analyze their advantages and disadvantages.(2)A two phase cross domain community detection model called 2-CDM is proposed.Different from the traditional community detection models,2-CDM uses the interaction between different domains to enhance the accuracy of community detection,which considers the influence of other domains during detecting in one domain.This model divides the community detection process into two stages,the first stage is to obtain the result of community partition in each domain.The second stage is to fuse the result of community partition and generate the final result.(3)An 'Iterative Collaborative Clustering Algorithm(ICC)is presented,which achieves the mutual promotion of different domains by the adjustment of other domains' cluster result for the similar matrix of current domain.What's more,we propose some strategies to optimize the algorithm from three aspects(the construction of basic similarity matrix,the correction of similarity and the setting of iteration's termination condition)and present an Improved Iterative Collaborative Clustering Algorithm(I-ICC).(4)An Cross Domain Clusters Fusion Algorithm called CDCF is proposed,which can fuse the results of community detection from different domain.Compared with the traditional clusters fusion algorithms,CDCF can be applied to the situation of having different number of entities and clusters in a domain.(5)The experimental results verified the feasibility and the effectiveness of key techniques proposed in this thesis.Compared with the methods of single domain community detection,ICC can take full advantage of the information from all domains,which makes community detection more accurate.Compared with ICC,the presented optimization strategies can generate more accurate result.In addition,compared with the traditional methods of clusters fusion,CDCF can significantly improve the accuracy of community detection.
Keywords/Search Tags:cross domain, community detection, iteration, collaborative clustering, label merging, voting, clusters fusion
PDF Full Text Request
Related items