Font Size: a A A

Research On Unsupervised Chinese Entity Relation Extraction Method

Posted on:2013-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2218330374967524Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, especially the development of the Internet, a lot of information emerges in electronic form. There is an urgent need for automated tools to get useful information from huge amounts of data quickly to cope with the challenges of information explosion. In this case, information extraction produced. At the same time, relation extraction as one of the subtask and key technologies of information extraction has also been more and more attention.Currently, researchers are mainly concentrated in the supervised and weakly supervised of machine learning methods. However, supervised and weakly supervised relation extraction cannot automatically identify the relationship which is not pre-defined. Therefore, researchers have begun to study on unsupervised relation extraction method. However, the research of unsupervised relation extraction method is still in its preliminary stage. There are still some inadequacies, such as not enough reasonable clustering results of entity pairs, low accuracy of relation extraction results, few researches on Chinese relation extraction and so on.This paper mainly improves unsupervised Chinese entity relation extraction method. Details are as follows:On the one hand, this paper proposes a new feature acquiring algorithm based on the heuristic rules. Combining the characteristics of Chinese grammar, the algorithm proposes five heuristic rules for obtaining the relationship characteristics between the entities, which can get more effective context characteristics of entity pairs. On the other hand, according to the characteristics of clustering algorithm and data set, this paper also puts forward a novel clustering algorithm. It introduces co-clustering theory on the basis of the k-means clustering, not only using the advantages of the k-means clustering in time and space but also making full use of the duality between entity pairs and the relationship feature In order to obtain more reasonable clustering results.Meanwhile, this paper designs and implements a prototype system of the improved unsupervised Chinese relations extraction method. In order to verify the influence of the proposed two improved algorithms on the performance of unsupervised relation extraction, this paper carry out experiments on the prototype system using the data set collected from the Internet. From the comparison of experimental results, we can see that it can obtain a higher accuracy rate to simultaneously use the two proposed improved algorithms in the unsupervised relation extraction.
Keywords/Search Tags:Relation Extraction, Feature acquisition, Characteristics of ChineseGrammar, Heuristic Rules, Clustering Algorithm
PDF Full Text Request
Related items