Font Size: a A A

Research On Semantic Annotation Technology In Ancient Literatures Of TCM

Posted on:2014-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:C L DingFull Text:PDF
GTID:2248330395487187Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Semantic Annotation is a process that adds standard knowledge representation todocuments under the guidance of a domain classification. The representation results can beapplied to knowledge mining, intelligent searching and some other analysis in depth. Ancientliteratures of TCM (ALT) carry the essence of TCM cultures, and many researcher pay muchmore attention to the analysis and processing of ALT. However, up to now, semantic labelingis not included in most of procedures for ALT digitizing. As a result, in data mining field, theresearch on ALT just stays in the shallow analysis phase. Deep research on semanticrelationships contained in the texts still relies on the manual analysis. Getting semantic labelsof ALT by semantic annotation is helpful to achieve the goal of automatic analysis of ALT indepth.This thesis annotates nominal terms and descriptive terms in ALT. The nominal termsdescribe names of specific things, such as Names of Traditional Chinese Drugs (NTCDs),Names of Traditional Chinese Prescriptions (NTCPs) and so on. The descriptive termsdescribe attributes of specific things, such as symptoms, pathogenesis, etiology and so on.In this thesis, the main tasks are as follows:1. For the annotation of nominal terms, a Bootstrapping method bases on semi-supervised,is proposed. The method is integrated with human-computer interaction technique to ensurethe accuracy of the results. Experiments show that the F values in recognizing NTCDs andNTCPs reach51.3%and44.9%respectively without interaction with humans. Whilesimulating the human-computer interactions, these values can increase to90.6%and74.9%respectively.2. For the annotation of descriptive terms, the labeling is converted to classification orsequence labeling of short sentence. And two pretreatment methods, reduction operationbased on nominal terms and replacement operation based on Hownet are proposed forreducing the sparsity in corpus and increasing features in annotation. The impact of different parameter settings on the labeling effect is analysed, and the best combination of parametersis discovered to solve the labeling problem. Furthermore, the experiment results show thatboth the two conversion methods are effective, and the sequence labeling model achieves abetter effect on annotation.3. An auxiliary semantic annotation system for ALT, which annotates semanticsinteractively, is designed and implemented. In the system, the semantic annotation effects canbe improved in the process when users participate.
Keywords/Search Tags:Semantic Annotation, Ancient Literatures of TCM, Bootstrapping, ShortSentence Annotation
PDF Full Text Request
Related items