Font Size: a A A

Research And Application On Key Technology Of Chinese Information Extraction

Posted on:2011-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:W J FuFull Text:PDF
GTID:2178360308997461Subject:Information and Signal Processing
Abstract/Summary:PDF Full Text Request
With the development of information technology, especially the prevalent application of Internet, more and more information appears in the people's eyes. It is a pressing problem how to get timely and accurate information people need. So people need to use the information extraction (IE) technology to deal with the huge information.In this paper, first, we designed and implemented a rule-based information extraction system in e-government. And we used the system to extract information from environment quality and weather forecast web pages.,which reached high rate of precision and recall.Then we focused on named entity recognition (NER) technology,a key technology of IE, and studied two kinds of implementations of NER. Rule-based NER method had high rate of precision but poor robustness and portability, while statistics-based NER method had better robustness and portability but lower accuracy rate. In the paper, on the one hand, a rule-based NER system was achieved; on the other hand, a conditional random fields (CRFs) based method of automatic term extraction was proposed. The experimental results confirmed the advantages and disadvantages of two kinds of implementations of NER.On this basis, a specific application of IE which was called intelligent documents processor (DOCProcessor) was described in the paper. We briefly introduced the framework of DOCProcessor and focused on the automatic text summarization module, in which the (CRFs) based method of automatic term extraction mentioned above was used to enhance the rate of the excellent summary. The experimental results showed that the (CRFs) based method of automatic term extraction could effectively increase the performance of automatic summarization module.Finally, the author expected the key development direction of information extraction technology and future applications.
Keywords/Search Tags:information extraction, named entity recognition, conditional random fields, CRFs
PDF Full Text Request
Related items