Font Size: a A A

Study On Japanese Dependency Analysis

Posted on:2010-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:T YuFull Text:PDF
GTID:2155360272470128Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Syntax parsing is a basic technique of natural language processing, including two methods, one of which is phrase based and the other is dependency based. Dependency relations present relations between words, and are easy to be converted into semantic dependency. Dependency analysis is widely used in machine translation, information retrieval and automatic abstract.Japanese dependency analysis is to determine an optimal combination of dependencies based on dependency constraints. The Cascaded Chunking Model, which is based on SVM model, can reach high accuracy in Japanese dependency analysis. The dependency accuracy reaches 88.66%. But there are two limitations to this method. First, the vector near the hyperplane is hard to be classified when the SVM Model is applied. Secondly, long sentence analysis can not be fulfilled with high accuracy.To solve these problems, four methods are generated in this paper as follow:In the SVM-KNN method, we classify the vector with the SVM model first. Then we use the KNN method to decide the class of the vector near the hyperplane.The CRF model is introduced to analyse Japanese dependency as combined with the the SVM model. To decide whether two chunks have dependency relations, we consider both the SVM and CRF results. We compare the output of the two models, and adopt the higher believable tag.To solve the problem in long sentence analysis, we present a Japanese parallel analysis method based on parallel structures. The parallel relations can divide a long sentence into several sub-sentences to reduce the analysis complexity. We analyse the dependency relations in the sub-sentences within parallel relations first, then out of them. Thus, we separate the long sentence analysis into short sub-sentences analysis to achieve higher accuracy.We improve the Fuzzy Support Vector Machine (FSVM) to the DFSVM with a new method to calculate fuzzy membership. We transfer the distance from the training vector to the hyperplane into fuzzy membership. It is easy to show the contribution of the corresponding training vector to the classification problem.The modified SVM-KNN method and the SVM-CRF model have decreased the hardship in classifying the vector near the hyperplane. The parallel structure tree-based method divides the long sentence into several short sub-sentences, and analyses the dependency relations layer by layer. The DFSVM model improves the FSVM model with new definition of the fuzzy membership.Experiments using the Kyoto University Corpus show that the proposed methods can improve the accuracy of dependency analysis and the DFSVM model reaches the highest accuracy. The dependency accuracy reaches 89.87%.
Keywords/Search Tags:Japanese dependency analysis, Modified SVM-KNN method, Parallel structure tree, DFSVM
PDF Full Text Request
Related items