Font Size: a A A

Research On Chinese Open Entity Relation Extraction

Posted on:2015-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:C Y HuFull Text:PDF
GTID:2308330482456293Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Traditional information retrieval method can solve the customer needs by returning a large number of pages with keywords or related content. Users have to search for the actual needs from the massive pages again. However, with the dramatically increasing of network data, such retrieval methods are increasingly unable to meet the needs of people to get the information efficiently and accurately. Open entity relation extraction technology emerged against this background. It can automatically identify relationships between entities in the field is not limited to an open network environment. It is important for development of intelligent, personalized, fine-grained information retrieval services.In this thesis, the Chinese open entity relation extraction is studied. We find that the current research in the field is still relatively little. And they most need the help of English syntax analysis methods to deal with Chinese issues. These problems lead to the performances of Chinese open entity relation extraction methods have been low. And they greatly limit the development of the field. To solve these problems, this thesis proposes a suitable Chinese open entity relation extraction program. Firstly, by using the relevant principles of data mining, this thesis proposes a feature extraction method based on association rule mining. And we give the method of transaction database construction, in order to make the Chinese corpus into processing objects of association rule mining algorithm. Also, to further improve the efficiency, this thesis improves the commonly used maximal frequent item sets mining algorithm by designing BFPMAX. Next, we find the maximum frequent item sets in the dataset based on BFPMAX. Then, we further analyze the frequent features have been obtained. Next, the features having an important role in the relationship identifying are retained after merging synonyms, feature dimension reduction, filtering and other processing. In this way, we will form a frequent feature set. Then, the relationships between entities are extracted based on unsupervised clustering, by using the acquired features. In this process, we propose a hierarchical clustering algorithm based on frequent feature set to improve the performance. Finally, the relevant evaluation criteria are presented in this thesis, so as to constitute a complete Chinese open entity relation extraction program.Experimental results show that the proposed Chinese open entity relation extraction program can achieve the better results than traditional method. For the open network environment with a variety of areas, without artificial annotated corpus, the proposed method can effectively identify relationships between entities to meet the needs of the user query and practical applications.
Keywords/Search Tags:open environment, entity relation extraction, association rule mining, feature obtain, cluster analysis
PDF Full Text Request
Related items