Font Size: a A A

Construction And Application Of Binary Collocation Semantic Knowledge Base Based On Multiple Knowledge Sources

Posted on:2013-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2208330362966055Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
All natural language processing system is based on the knowledge system, and collocation database is an important part of the knowledge system. It is widely used in natural language generation, machine translation, information retrieval, word sense disambiguation and text error-checking.This paper introduces the relevant theory and the background and several current representative semantic knowledge dictionary, uses the HowNet and large-scale corpus of the true text to extract semantic collocation knowledge, which is used to construct a three layers of semantic knowledge database.This paper also tries to make some application of this knowledge database. This article mainly includes the following aspects:1. A new extracting arithmetic of collocations based on typical patterns is designed. This algorithm combines relevant semantic knowledge, uses the typical sentence patterns and rules and some mathematical measure such as frequency and mutual information.It extract nine collocation word type which use nouns, adjectives and verbs as the center from the large-scale corpus of the true text. And the accuracy rate is90.9%. The algorithm covers all substantival collocation.The experiment proved that its completeness and accuracy are greatly increased.2. A semantic-extraction model is designed. This model introduce a new mathematical measure named special degrees (SD) to quantitate the collocation of word by semantics, and to metrical expand the collocation of words.This paper use SD as the core measure to extract semantic collocation from420000records of collocation.3. The storage structure of the large semantic collocation knowledge base is Designed and realized for the requirement of error checking, this paper design the system structure and describe system of collocation knowledge base.This database contain four knowledge base which is three layers, including word collocation (layer1):words-collocation knowledge base; Half-semantics collocation (layer2):words_semantics knowledge base1and words_semantics knowledge base2; The total semantics level collocation (layer3): semantics_semantics knowledge base. At the same time, the automatic learning of the database is realized by continuous increasing real corpora; the collocation knowledge base is up-to-date.So that it is infinite close to the whole collocation.4. The application of this collocation extraction algorithm and collocation knowledge base is tried.1) Using his collocation extraction algorithm to extract long words and collocation of given field, the accuracy of which is93.21%.2) Collocation knowledge base is used to error checking of text, the accuracy of which can reach86.7%.5. The collocation application system is Designed and realized.Using the knowledge base and related application algorithm, this collocation application system is designed and realized, which is divide into there modules of error checking,long words self-learning and dictionary operating.The work of the construction of the semantic knowledge base is difficulty and large. Although most of collocation is extracted, the work is not comprehensive. But this is a feasible technology route, which give basic resources for some application such as error checking. And the application prospects are very broad.
Keywords/Search Tags:collocations, sentence pattern, mutual information, SD, primitive, semantics knowledge base, debugging, self-learning
PDF Full Text Request
Related items