Font Size: a A A

Study On Finding Dimensions For Text Based Oh Weighted Heterogeneous Information Networke

Posted on:2015-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:F JiangFull Text:PDF
GTID:2268330431955494Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Multi Dimensional text database exists in several areas such as commerce and science, which contains not only structural data of multi dimensions, but also descriptive text data, such as the social network databases of micro blog, the product review databases in business intelligence and the bibliographic databases in scientific research. In this paper, we propose a method to find the hierarchical topic dimensions of text in given heterogeneous information network.In this paper, the information network is a graphic structure in which nodes represent the data entity while the edges imply the relationship among data entities. J. Han and his colleagues propose methods of mining information networks in. The method analyzes not only the individual node but the relationship among nodes which contains rich latent semantics. Information network has a strong power of expression, which can be applied to model the data that are rich in complex structure. As a result it is suitable for various topics objects with complicated structure, such as relational, reference, recommendation, product, social and text networks,.etc. As to the dimension, it is the perspective we used during the process of analyzing data objects, while measure is the data reflected from analytical table. To find the dimension we need to determine the granularity, the layer and the measure of fact tables. In this paper, we map the granularity, the dimension, the layer and the measure of fact tables into similar topic granularity, text topic dimension, iteratively computational network and topic subject phrases, respectively. By clustering the result set of OLAP operation, we can restore text dimension adversely, and statistic analysis of the result of OLAP operation needs the layered clustering of text data set. Based on the work of previous classical methods for text clustering and keywords extraction, we propose a method for finding layered clustering dimensions on heterogeneous information networks. Our experiments show the efficiency and effectiveness of the methods.The main contributions of this thesis are as follows: 1. The paper introduces a method for text data clustering. When facing a set of objects hard to be clustered more subtly, we can find new objects that are closely associated with original ones to analyze and cluster. According to the common attributes between the new objects and original ones, clustering for new objects can be viewed as further cluster for the original, and then we can accomplish the clustering task based on the newly created evidence.2. We propose a method for building a weighted heterogeneous information network. We address the documents in the way of vector space model, and compute the similarity of the contents in documents based on the feature phrases’ position and frequency, and then build the edges among documents. According to co-author relationship among authors we amend the similarity between documents. Iteratively we use the similarity between documents to amend the similarity between authors and calculate the weights among nodes of same type, and then a weighted heterogeneous information network is built based on the relationship among nodes.3. A layered clustering method is proposed for community partition according to the granularity in weighted heterogeneous information network. By means of searching the communities whose similarity is larger than a given threshold value, we partition the documents network into several parts as the clustering process. Finally the layer and dimension are generated according to the granularity.
Keywords/Search Tags:heterogeneous information network, multi dimensional text data, information network analysis method, text dimension
PDF Full Text Request
Related items