Font Size: a A A

Clustering On Bi-typed Information Networks

Posted on:2017-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:B B WangFull Text:PDF
GTID:2308330482995687Subject:Software engineering
Abstract/Summary:PDF Full Text Request
At the beginning of data mining research, especially mining information network, it’s almost oriented towards single-typed(also called homogeneous information network). However, we are always faced to a complex network(that is, heterogeneous information network) which has lots of types of objects and in which establishes lots of types of relations. By comparison, it is found that for the same event, heterogeneous information networks has wider information sources and has more information than homogeneous ones because of numerous types of objects supporting references, and it’s helpful for getting useful information accurately to mine heterogeneous information networks. Therefore, the researchers set out to study heterogeneous information network mining when they had enough knowledge.There is a long way to go that studying information networks from single-typed objects to multi-typed objects, and there are lots of questions which need considering, such as how many types of objects producing an effect on the target objects and is the effect positive or negative, if one type of objects produces an effect on the target objects, and are there any types of objects producing an effect on it.Researchers think they can start with networks studying which have bi-typed objects, and then expand networks which have multi-typed objects using special model and topology. I can be said that bi-typed network is the transition from the homogeneous information network to the heterogeneous information network, and it is a bridge between them.In this paper, we choose the bi-typed bibliographic network as the supporter of the algorithm, which uses papers which one author published in one conference as information between the two connected objects, that is, the weight of the link.There is a interaction between ranking and clustering, that is, the closer the rankings of the objects are, the greater the chance that the objects are in one cluster is, and the rankings of objects which are in one cluster are closer. In this paper, we use the ranking algorithm to initialize the target type objects, which makes most of the similar objects in one cluster and dissimilar ones in different clusters. It facilitates the following algorithms through the rough division. By using the ranking algorithm, the abstract target type objects are replaced by the concrete numbers, which are called rank scores. Then, the target type objects are divided into K-clusters by dividing the rank scores.If an object can be represented by a vector, a cosine similarity measure can be used to compute the similarity of two objects. The smaller the angle between the two vectors, the greater the cosine value, the more similar the two objects are. According to this principle, we can calculate similarity between two document using the word-frequency vectors. In bi-typed bibliographic information network, the target type objects can be represented by vectors. That is, the number of papers which every author published in the target conference indicates the coordinate data of the vector. And when the author published no paper in the conference, it’s recorded as 0. Therefore, we use the cosine similarity function to measure the similarity of two objects, and to adjust the distribution of the objects in each cluster by the similarity between the target object and the other objects.In this paper, we use the DBLP dataset to generate the data of experiment. The experiment shows that the algorithm is feasible and effective, and it can be applied to the clustering of bi-typed information networks.
Keywords/Search Tags:bi-typed information network, clustering, ranking, cosine similarity measure
PDF Full Text Request
Related items