Font Size: a A A

Chinese Chunk Dependency Analysis Based On Support Vector Machines

Posted on:2007-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:P YinFull Text:PDF
GTID:2178360182961020Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Syntax analysis is the crucial section of the machine translation. Dependency analysis is a important method of syntax analysis. The dependency tree resulting from the dependency analysis can represent the further syntax relations between the words in the sentence. And the dependency tree can also save the storage space. The paper research the dependency analysis method using support vector machine(SVM).Because chunk analysis can make certain partial result and reduce the ambiguity in syntax analysis. So the papar research the dependency analysis based on the chunks. In order to make each word inside one chunk, define the twelve types of Chinese chunk by expanding the Chinese chunk standard of nature language processing laboratory, Dalian university of technology.Define a dependency architecture based on the Chinese chucks. Twenty-four dependency types are defined. Supply a standard for the dependency corpus making.In the paper, analyze the dependency relation with deterministic and nondeterministic algorithms respectively. Because Nivre algorithm have been used for English dependency analysis, and the syntax stucture also resemble between Chinese and English. So choose the Nivre algorithm for the deterministic algorithm. In the paper, one nondeterministic algorithm is designed based on the Chinese chunk. The procedure is follow. Use SVM classifer estimate each chunk pair and get the dependency coefficient, choose one point whose dependency coefficient is the biggest for each chunk, then resolve the cross dependency and circular dependency, finally output the result.From the experiment result, can see that, the correct rate of deterministic algorithm is 75.664% and the correct rate of nondeterministic algorithm is 82.574%, obviously the correct rate of nondeterministic algorithm is better than that of deterministic algorithm. In the end of the paper, analyze the reason of wrong parsing and enumerate the dominating error type.
Keywords/Search Tags:Syntax Analysis, Dependency, Chunk, SVM
PDF Full Text Request
Related items