Font Size: a A A

The Research Of Modeling Multi-Networks Based On Unstructured Data

Posted on:2011-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:G C WuFull Text:PDF
GTID:2178360308961300Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of enterprise informatization and the Internet, the amount of unstructured and semi-structured data increase. In recent years, the research about the data mining based on the unstructured and semi-structured data become one of research focuses. The latest research results in complex network and Chinese information processing provide us a new perspective to mine information from the unstructured and semi-structured data, which is merging the complex network and Chinese processing information technology. Firstly the Chinese information processing technology is used to extract information from the unstructured and semi-structured data, then model the networks based those information and analyze them.Through analyzing and comparing the complex network application in different fields, it mainly contains two parts:network modeling and network analysis. Network modeling analyze the data and find the connection among the individuals to model a network, which is the basis and the key. In this paper, we will research and use Chinese information processing to model multi-networks based on the unstructured data.Firstly, in this paper, we will research text clustering and adopt it to divide the dataset to some sub datasets which belongs to different fields. Through analyzing about traditional clustering methods, we present the text clustering based on the community detecting algorithms, adopt it to cluster the text datas and have good effect.Secondly, in this paper, we will research the Chinese information extraction technology to extract the entities from the unstructured data. Owing to the important of edges in the network modeling, we mainly research the entity relation extraction. We change the two steps of the unsupervised relation extraction method to extract the information faster and better, that are collecting the contexts about the entities co-occur and clustering these contexts. Meanwhile, to analyzing sparse dataset, we implement the relation extract method based on event frame to extract the special relations which users set.Through comparing the application of complex network in different fields, we find the common networks include homogeneous network, heterogeneous network and dynamic network. So we will build the networks from the different perspectives and dimensions of the unstructured data. Those networks contain document-document relation network, document-entity relation network, entity-entity relation network and dynamic network.Finally, we design and implement a prototype system to merge the research focuses of this paper, then experiment it to verify the validity.
Keywords/Search Tags:complex network, network modeling, entity relation extraction, text clustering
PDF Full Text Request
Related items