Font Size: a A A

Research On NLP-Based Automatic Semantic Annotation For Patent Document

Posted on:2012-10-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z YangFull Text:PDF
GTID:2178330332976033Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous development of information society, the knowledge people output increase substantially. In today's knowledge management system, the document is still a major presence as the form of knowledge, which includes books, newspapers, periodicals and hundreds of millions text files in various formats on the World Wide Web. The information of unstructured documents is difficult to be used by tools to achieve knowledge from it, so an effective method to extract information from unstructured, semi-structured document is greatly needed. There are method based on Web-structure analyzing, document content analyzing, which are lacking semantic support.This paper analyzed the characteristics of a kind of semi-structured documents: patent documents, studied the classical semantic annotation methods inland and abroad, and proposed an automatic semantic annotation method for patent document based on natural language processing, automatically extracted semantic information from patent document and generated structured document. Firstly, our method preprocess patents, extracting header information and doing the patent document Chinese segmentation; then we define the document pattern by analyzing patent name, load relative annotation-rule and extract the semantic information from document. At last, we transfer the information extracted to OWL-Lite form, and generated the XML data.This paper established a framework for automated semantic annotation, and studied the key technologies, including:patent header information preprocessing, patent document Chinese segmentation, tagging rules learning, pattern finding based on patent name, semantic annotation based on rules. In order to improve the framework of semantic patent, this semantic modeled of the patent field that support patent semantic architecture by introducing of common ontology, domain ontology.Finally, based on a widely used knowledge extraction and document processing open-source framework GATE proposed by the University of Sheffield in the UK, We realized our method-Automatic semantic annotation method for patent document based on natural language processing. We compared the system with ANNIE in GATE, and we give the results and analysis.
Keywords/Search Tags:Semantic Annotation, Ontology, Information Extraction, Patent Documents, Natural Language Processing, GATE
PDF Full Text Request
Related items