Font Size: a A A

Research On Key Technologies In Ontology-based Operation Documents Segmentation

Posted on:2015-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:X D YangFull Text:PDF
GTID:2268330428463955Subject:Detection Technology and Automation
Abstract/Summary:PDF Full Text Request
With the advent of "information" era of military operations, handlingmassive operation documents merely with manual handing can not meet therequirements of extracting information quickly and correctly. Only if a computer canunderstand the content of a operation documents with a “logical thinking” may thisproblem be solved. Since operation documents consist of continuous Chinese stringswithout any separator while computer can not understand the content as its basicproceeding unit is a word, Chinese word Segmentation is the key technology tounderstand the content quickly and correctly. The accuracy of segmentation candirectly affect the subsequence processing which including POS tagging, syntacticanalysis, the key information extraction and situation labeling on map and it is evenessential to its success. Therefore, researching on the word segmentation of operationdocuments remains an important topic. Studying on the features of Chinese narrationin operation documents, this thesis studied on the following aspects and the mainachievements include the following components:(1) Details of the development process of the Chinese sub-word theory andapplication systems have been introduced in this thesis, and it also stressed theimportance and necessity of segmentation of operation documents.(2) This dissertation offers a summary of the theory of Chinese wordsegmentation which is based on two different categories of words and phrases andanalyzes the segmentation algorithms which are applied to common segmentationsystems. By analyzing, we make it clear that the ambiguity and new wordidentification are the two challenges that hinder the development of Chinese wordsegmentation technology.(3) Considering the diversity of format and content of the operation documents,we introduce the ontology which is based the semantic relationships and describe theontology by using OWL ontology language. At the same time, we utilize the Protégésoftware developed at Stanford University to build military ontology. Instead of usingthe traditional word dictionaries, we achieve conceptual functions of reasoning andsharing by using highly generalized military ontology with a function of logicallyrationality. (4) A forward and reverse maximum matching algorithm (FMM&RMM) ofoperation documents based on ontology has been proposed in this thesis. First of all,we instruct a series of rules to extract words form documents based on the differentwords-using standards of different operation documents. And those rules are appliedto the exaction of date, Geographical names, military names and other proper nouns.Then based on the ontology, the extracting rules and dictionary, we parse theoperation documents by using FMM and RMM. Finally, we get rational results ofsegmentation by using semantic similarity and contextually correlation in militaryontology to deal with the ambiguity in the process.(5) Finally, this dissertation designs and implements a software system foroperational documents segmentation which includes three modules: a preprocessingmodule, a segmentation module and ambiguous word synthesis module. After parsingthe typical operation documents in the system and analyzing the results, we comparethe results with CAS-term system and ICTCLAS. At last the usability of the methodto segment the operation documents is demonstrated through the comparison.
Keywords/Search Tags:Operation documents segmentation, military ontology, ambiguousproceeding, semantic similarity
PDF Full Text Request
Related items