Font Size: a A A

Design And Implementation Of Probabilistic Disambiguation Model Based On BCG

Posted on:2009-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2178360245995531Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As rapid progress of information technology natural language processing is an important research field of artificial intelligence. Parsing is one of the fundamental problems in natural language processing, so it is an important problem of domain of Chinese information processing. Machine translation and automatic summarization will benefit from accurate parsing result, which can enhance the intelligence and applicability of the systems. It's main task is to recognize the structure of sentences and get all the trees which satisfy the parsing model. The results affect the accuracy rate.During to structure of Chinese and semantic complexity, there are many parsing method. In this paper, which inherits some effective and classical technique, we offer new research method based on rule and statistics.In this thesis, we bring in Binary Combinatorial Grammar (BCG), which is advanced by Xiao Yang. At the same time, we advance conception of binary operation relations and BCG architecture.Parsing algorithm, which affects accuracy rates and the efficiency, is the important part of analyzer .Disambiguation. We introduce some traditional algorithms, and analyze and compare the theory, time consumption and processing strategy. We introduce classical CYK parsing algorithm, we advance proved CYK algorithm based binary combinatorial grammar parsing. Firstly, According to the characteristic of BCG grammar, the binary operation relations precedence is fused to algorithm to do pruning in the parsing and to resolve disambiguation. All of the spent time and result trees in chart based BCG parsing are less than traditional CYK algorithm. Secondly, all edges produced in the parsing process are stored by a tabular structure in the algorithm to reduce the space spent.Disambiguation is the important of parsing. Based on a classical probability context-free grammar (PCFG) module, the inner structure information, head word and probability information extracting from bank is incorporated to form a new module. We use Viterbi algorithm to pruning and acquire the best syntactic parsing tree. At last, we use algorithm to choose the probability of the rule in order to gain the maximal probability.The experimental results show that the precision above 72.4% and 81.0% during processing sentences of 12 and 16. All in all, the parsing method based on BCG has certain research meaning and applies worthiness.
Keywords/Search Tags:Natural Language Processing, parsing, CYK, BCG, Disambiguation
PDF Full Text Request
Related items