Font Size: a A A

A Study And Implementation Of Semantic Annotation For Chinese Text

Posted on:2015-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y CuiFull Text:PDF
GTID:2308330464468590Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Internet plays an increasingly important role in human society. Massive amounts of data contains massive amounts of valuable information, the data must be effectively processed before digging out valuable information. Semantic annotation is the effective method that convert the semi-structured and unstructured multimedia data which computer cannot process directly to semantic data. Semantic annotation is the foundation of semantic reasoning, is a process with ontology as the guidance, creating entities and annotating properties for multimedia data, making the data resources from the machine readable rise to machine understandable. This paper proposes an automatic semantic annotation framework for Chinese text. Based on this annotation framework, mainly includes three aspects:1. The automatic semantic annotation framework follows the general process of semantic annotation: entity creating—concept labeling—property labeling. In The first two stages, use named entity recognition algorithm to complete the automatic recognition for Chinese person name, position name and institution name. Algorithm based on conditional random fields’ model which can avoid the independence assumption and labeling bias. On this basis, this paper analyzes the different between Chinese person name, position name and institution name in lexical features, and tries two different graininess for Chinese named entity recognition(based on characters, or based on words). Finally confirmed by experiment, recognition results is consistent with the previous inference results.2. In the property labeling stage, the paper use Stanford Parser to construct syntactic dependency tree for Chinese statement. Considering the parsing for Chinese long sentences has low accuracy, we add the preprocessing algorithm on the Chinese long sentences. At the same time, based on the syntactic dependency tree, we propose 7 heuristic rules for relation extraction. Finally confirmed by experiment, the preprocessing algorithm and heuristic rules are able to enhance the overall performance of the algorithm.3. Based on automatic semantic annotation framework, this paper designs and implements the automatic semantic annotation system for Chinese text. The core functions of the system is to complete the automatic semantic annotation for Chinese text based on ontology by combining the ontology management function and automatic annotation function. Finally this paper has completes the testing work for the system, the test confirmed that the developed system could complete the automatic semantic annotation, with high performance and practical value.
Keywords/Search Tags:Semantic Web, Semantic Annotation, Named Entity Recognition, Relation Extraction
PDF Full Text Request
Related items