Font Size: a A A

The Study Of Text Clustering By Heterogeneous Networks With Multiple Attributes

Posted on:2017-02-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:J P CaoFull Text:PDF
GTID:1368330569498498Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of social networks and news media,a large amount of network and web data flood us,such data boasts the research of network analysis and web-data mining.The large amount of real-word data also encourages government departments,non-government organizations and business companies to construct intelligence systems for the perception of their public roles,the understanding of intelligence and the market-ing of products.As one of the key techniques of intelligence systems,cluster analysis has been enormously studied in the fields of data mining,knowledge discovery and in-telligence system.Facing the text and network data with various topics,different types and heterogeneous structures,how to conduct the topic cluster analysis and knowledge discovery?How to promote text clustering with semantic knowledge in the knowledge database?How to discover the common hot topics from different platforms with different types of data?Moreover,how to meet the requirements of users to realize target cluster-ing?This thesis is focused on the clustering of heterogeneous information networks with multiple attributes,especially on the key problems and challenges in this fields:the con-struction of cluster analysis framework in intelligence systems,the representation model of text data,the mutual clustering methods and the clustering on large attributed networks with multiple annotations.The contents and innovative points of this thesis can be sum-marized as follows:1.For the problem of how to construct heterogeneous networks with multiple at-tributes,we analyzed the topology of heterogeneous networks and sources of node attributes,and proposed a analyzing framework on heterogeneous networks with multiple attributes.For the sub-problem of the extraction of attributes,we analyze the sources of attributes and related methods.We take the sentiment attributes an-alyzing as an example and propose an algorithm to compute the sentiment polarity in a specific area,and we give the whole process of analyzing the sentimental at-tributes of nodes.The experiments on real-world datasets prove that our method is effective and efficiency.2.For the problem of how to represent the text with heterogeneous networks for clus-tering analysis and knowledge discovery,we propose a framework of presenting texts with heterogeneous information networks,that is using the structured infor-mation to construct a heterogeneous information network for clustering.Specif-ically,we propose heterogeneous information networks for news and tweet texts by using words,entities,and topics as multiple objects to construct heterogeneous information networks.Next,we develop a heterogeneous information networks co-clustering model for texts,and the model using the attribute types as constraints for clustering.Experiments on four different real-world datasets prove that our model is effective and efficiency in clustering heterogeneous information networks.3.For the problem of how to cluster the texts from multi-sources simultaneously,we propose a heterogeneous information networks-based texts clustering(HINT)framework.Such a network transfers the information form different sources to construct for clustering.Specifically,we first utilize heterogeneous information networks to represent tweets and news articles respectively and introduce "anchor texts" to effectively connect the two types of texts.Next,we construct the similari-ty matrices for the two types of texts respectively and develop a transition matrix for transferring information between the two matrices to direct the two clusterings into a consensus result.Finally,we propose a mutual clustering algorithm to effective-ly refine the clustering results.Extensive experimental results on three real-world datasets verify the effectiveness and robustness of the proposed HINT framework in addressing the proposed problem.4.For the problem of Clustering Large Attributed Graphs with Multiple Annotations,we propose a framework which allows multiple users to give their annotations to guide the clustering.Since the problem is a novel with two major challenges that need to be addressed.Firstly,as user selected samples are usually sparse and the graph can be large,it is non-trivial to effectively combine the annotations given by different annotators together.Secondly,it is also difficult to develop a scalable and stable approach due to the largeness and complexity of datasets.To address these challenges,we propose an approach called Clustering Graphs with Multiple Annotations(CGMA)in this paper.The approach is able to combine the annotators'consensus opinions in an unbiased way by inferring the annotators' preferences and combining them together.In addition,we also propose a parallel local partitioning method for a guarantee of the scalability of the proposed approach.We show the effectiveness and efficiency of CGMA on both synthetic and real-world graphs by comparing it with existing graph clustering approaches.In summary,we focus on the cluster analysis of heterogeneous networks with multi-ple attributes in intelligence systems,with a focusing on the improvements of clustering with constraints.We propose to use the constraints of databases,other clusterings and user guidance.The studied problems and our methods are all innovative to some extent,and we can improve the performances of intelligence systems and researching level of intelligence analysis.Public opinions are part of the state situations,our study will be of great value to the intelligence systems of big data that mattering the national security and social development.
Keywords/Search Tags:Multiple Attributes, Heterogeneous networks, Cluster Analysis, Intelligence Systems, Society Management
PDF Full Text Request
Related items