Font Size: a A A

Research Of Entity And Relation Extraction Based On Text

Posted on:2014-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:F C LiuFull Text:PDF
GTID:2308330479479288Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The Information Extraction(IE) technology can extract entity, relation and event which arise people’s interest from unstructured data to form structured memory for querying.The named entity recognition and relation extraction are important tasks in IE, which also the hotspots of research. However, with the explosive growth of text data, it is urgent to use the abundant unlabeled corpus to improve the performance of information extraction. In this aspect, the traditional methods represented by supervised learning do not make a good job, and it is a tide to solve this problem by unsupervised and semi-supervised learning. This paper, based on the deeply researching and summarizing of researchers’ work in history, improves the semi-supervised named entity recognition and relation extraction, and requires a satisfied outcome.In the facet of named entity recognition, this paper put forward a new approachSACRF(Self-training with Active learning based on CRF), which based on little training corpus and large scale unlabeled data, to expand unlabeled corpus automatically by self-learning and annotate samples of low confidence by active leaning with selecting conditional random field as the foundational classifier. The experiments reveal that this approach can not only improve the precision and recall of NER system with expanding training set automatically, but also reduce the manually annotation efforts greatly.As to relation extraction, considering the introduction of noise and the lower precision of traditional semi-supervised method, this paper improves the voting policy of Tri-Training algorithm, and introduces active learning to farther advance precision of extraction. In the experiment of relation extraction, this approach acquires a sizeable improvement in precision compared with classical Tri-Training.At last, this paper accomplishes the two algorithms, and integrates them into a visualization system for text association. This system can extract entity and relation from text and layout them on the screen, and possesses the primary functions of relational analysis and man-machine conversation, which assist consumers to analyze and make decision.
Keywords/Search Tags:Information Extraction, Named Entity Recognition, Relation Extraction, Semi-Supervised, Self-Learning, Active Learning, Tri-Training
PDF Full Text Request
Related items