Font Size: a A A

Research On Name Entity Recognition And Relation Extraction In Financial Text

Posted on:2015-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:S W WangFull Text:PDF
GTID:2308330479489721Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology, the problem of how to effectively and credibly find out and organize the knowledge in the web text has become an important task in the natural language processing and data mining field. On the Internet, the relations between varies of entities are very complex. Extracting knowledge only with the method of manual work or experience will hardly meet the requirement. Therefore, the automatic knowledge acquisition method has been an important issue in text processing.This paper mainly focus on the research of automatic knowledge acquisition method in financial sector. From existing research, the methods based on rules have a higher precision but need complex expertise experience; while the methods based on statistical relay on the training data badly. Bad training data, incorrect choice of features and different tagging method of the data usually have bad influence on final results. So, applying only one method will not achieve the tasks.This paper has three aspects of researches. Firstly, aiming at the features of people name, build people name recognition model to recognize the Chinese name. And the method has made a performance of 0.94 in F-measure in 2,008 manual tagged data. Secondly, observed from statistics on organization names in financial texts, organization names can be classified into full form and abbreviation form. For the full form organization names, the research applies the CRFs to make the recognitions together with the financial features. For the abb reviation form, the recognitions focus on the combination degree, boundary features and recognized full form organization names. The method has made a performance of 0.93 in F-measure in 5,500 manual tagged data. Thirdly, with the features in texts, this research uses the strategy of iteratively generating and evaluating on relation express patterns to recognize the relations among the financial organizations. Based on common relation types, five kinds of relations are defined. And experiments on the manual tagged data with 2,167 samples has meet the expected requirement.The main contributions of the paper contains: firstly, for the recognition of full form and abbreviation form of financial organizations, the research puts forwards the methods with the combination degree, boundary features and recognized full form organization names, and improves the performance of the abbreviation form recognition. Secondly, by using the iterative pattern generation strategy, the research makes more expression pattern with a few of initial patterns. The method can automatically study more new patterns without too much manual intervene so as to find more relations.
Keywords/Search Tags:name entity recognition, entity relation detection, financial text
PDF Full Text Request
Related items