Font Size: a A A

Research On Definition Extraction In Aviation Domain And Its Application

Posted on:2012-07-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:X PanFull Text:PDF
GTID:1268330422952663Subject:Carrier Engineering
Abstract/Summary:PDF Full Text Request
CBT(Computer Based Training) system plays an important role in pilot training andmaintenance training in civil aviation as a part of advanced training technology.Productions ofCBT have been widely used in airline from home and abroad, and deployment of maintenanceCBT system is a prerequisite for intermediate maintance units. The work in this paper startedaround critical technologies in obtaining professional knowledge from professional literaturesusing term definition extraction techniques. In this paper, we also explore the approach ofapplication of knowledge extracted from professional literatures in intelligence CBT systemdevelopment.The contributions of this dissertation are mainly summarized asfollows:Firstly, Corpus is basic resource of all natural language processing research, but noready-made available for the study of term definition extraction at home and abroad. So theprimary task of this paper is to construct a corpus for experiments. According to the experimentalrequirements, this paper establishes construction scale and standard of corpus of first stage, anddevelops corresponding software. This paper also carries out detailed statistical information on thecorpus as the basis for further study.Secondly, the basic method of definition extraction is unbalanced data classification.Because of different research purpose, solutions for getting definitions for question answer orranking as search engine do not apply in this paper. In view of imbalance distribution of termdefinitions in corpus, a method based on balanced random forests is proposed to extract definitionsfrom corpus. A novel over-sampling strategy based on distance distribution information ofinstances is proposed to solve the problem that randomly synthetic instances cannot effectivelyconsolidate regional border of minority class instances in building a balanced training set.Experiments show that it improves the results of F1-measure and F2-measure in extractingdefinitions.Thirdly, improving feature selection method in definition extraction using distancedistribution information of instances. Inorder to address the imbalance distribution of data andsmall disjuncts in definition sentences, the new feature selection method is defined based onbetween-class distribution difference and within-class distribution difference of features. The newmethod improves the shortcoming of traditional methods that evaluation methodology relies onword frequency statistics. Experiments show that the BRF classifier using new method achievesthe same results with fewer features in extracting definitions.Fourthly, extracting definitions using multi-level linguistic features. Situation of usingmulti-level linguistic features in different sub-topics of information extaction is summarized firstly. Because of lacking of quantitative method, multi-level linguistic features can not be used inextracting definitions. In this paper, a feature combinations entropy based method is proposed tocalculte impact of different combinations in extracting definitions. The method provides acomputable way to evaluate linguistic features of different level in extracting definitions.Experiments show the correctness and validity of this method.Finally, designing and implementing an inteligent assessment system for CBT. Existing AIGtechnology is not conductive to generate questions for professional field and distractors are lessconfusing. In this paper, a novel AIG system is designed to solve this problem. The systemgenerates items using a variety of knowledge and sentence templates, and generates distractorsusing domain ontology. These resources are achieved from extracted definitions. The new designmeets the demond of CBT system for automatic assessment and evaluation of professionalknowledge effectively, and eases workload of developing item bank and examination papers.
Keywords/Search Tags:definition extraction, information extraction, corpus, unbalanced data classification, over-sampling, feature selection, Multi-level feature, combined feature, automatic item generation
PDF Full Text Request
Related items