Font Size: a A A

Research On Entity Relation Recognition Based On Dynamic Granulation Theory

Posted on:2007-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:X F GuFull Text:PDF
GTID:2178360185950965Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Information extraction is a main branch of natural language processing filed, and its focuses on quick extracting true needed information from plenty of information source for transforming non-structured text into structured or half-structured information, and saving as a date-base, so it is convenient for users inquiring, analysis and utilization.The research on Chinese information extraction has a quite late start, and it is concentrated on identifying method of Chinese named entity. In recent years, the study on entity relation recognition becomes more important with entity extraction research step-by-step practicality. Entity relation recognition has positive sense on text understanding, information retrieval, information extraction, QA system, machine translation. At present, the study on entity relation recognition is still in incipient stage. Researchers adopted machine learning based on feature vector to identify Chinese entity relation.Entity relation recognition depend on text characters, different character granulation have significant effect on extraction results. As for former method, it used unification granulation feature to extract relation, it would result in identification shadow zones which is due to relative thin granular feature, while relative rough granular question. For the purpose, we provide an identification method based on rough set approximation underdynamic granulation to identify entity relation.The idea of dynamic granularity is used in this paper for the first time. We thin identified feature gradually, construct a character sets of partial order relation, then identify entity relation through training, get better results. The main objective of this study was as followings:1. Label corpus 0 We conducted entity label, according to 800 papers in 3.11 about explosion news corpus in Madrid with about four hundred thousand words. At the same time, a small quantity of subway station explosion news corpus of 7.7 in London was also labeled, and used as open test corpus.2. Entity pair clustering. On the base of analyzing real corpus, OPTICS clustering arithmetic was used in this study to conduct initial clustering of entity in experimental corpus.3. Character choosing. According to some rules of character choosing, producing extraction character of every type relation from clustering results.4. Construct character sets of partial order relation. Applying the idea of rough set approximation under dynamic granulation, we thinned characters, and produced a group of character sets of partial order relation from rough to thin.According to the method, we designed and realized recognition experiment of entity relation based on dynamic granularity. In close test results, average F-Score of every type entity was above 80%. In contrast tothe method of unification granulation character, average F-Score increased 5 percentage points, even the best was about 8 percentage points. In open test, the method in this study was about 7 percentage point higher than the former method.At last, we analyzed wrong instances in experiment results and the corresponding reasons, and brought forward some solutions.In this study, we used dynamic granularity idea to identify entity relation, and had better results. More deeply research is to be studied on the basis of more resources and characters in the future.
Keywords/Search Tags:unification granulation, entity relation, dynamic granulation, rough set approximation, partial order relation
PDF Full Text Request
Related items