Font Size: a A A

Research On Chinese Integrated Parsing Model

Posted on:2005-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y D ChenFull Text:PDF
GTID:2168360155971974Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the development of computer and application of internet, the large quantity of text cannot be processed in handwork. The goal of national language understanding (NLU) is to process the text in fast speed and high quality. In this paper, based on researching the ambiguities in each step in NLU, an integrated parsing model is put forward. The nature of this model is transferring the ambiguities which can not be processed in current steps to the next step under the condition that disambiguation as possible here, and a semantic strategy is designed to decrease the complexity of syntactic analysis.The disambiguation is the principal task of the NLU. This paper researches various ambiguities in NLU and relative approaches. Comparing with the several integrated approaches, we put forward to a new integrated parsing model. The principles of the model are transferring the ambiguities which can not be processed in current step to the next step and disambiguation as possible in current step in order to decrease the complexity of next step.The word segmentation is the first step in NLU, and the quality of segementation influences the next step. We firstly define the sentence coverage rate and word coverage rate. And then, a Based on Directed Graph Bi-directed Maximum Match is designed. We present the feasibility of this segmentation algorithm. The advantage of it is reserving the ambiguities by several sequences. Comparing the classic rule-based approaches and Omni-segmentatin, the algorithm obtains the high coverage rate in low complexity.In tagging, we modify the Viterbi algorithm because it ignores the tagging ambiguities. A strategy of transferring probability between tagging and parsing is provided to implement the parallel in tagging and parsing. By transferring the probability backward, the ambiguities in tagging are all reserved, on the other hand, a context provided by probability is used to syntactic analysis. We illustrate the advantage by an example.It would be a heavy load in parsing because of receiving ambiguities corned from other steps. We combine the semantic strategy with the integrated parsing model so as to cut the irrational syntactic trees. By semantic tagging and semantic matching between words, the semantic strategy implements word sense disambiguation.Finally, we sum up the integrated parsing model and present its disadvantages, then point out the future resarch direction.
Keywords/Search Tags:The integrated parsing model, Bi-Directed Maximum Match, Hidden Markov model, Viterbi algorithm, Probability- based Generated LR algorithm
PDF Full Text Request
Related items