Font Size: a A A

Research On Chinese Syntactic Structure-Tree Based On Data-Oriented Parsing

Posted on:2011-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:H X GuoFull Text:PDF
GTID:2178360305495365Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As a core issue of natural language processing (NLP), high-performance syntactic analysis played an important role in NLP applications. As the rules of syntax hard to come to an end, and it is not enough to process real Chinese corpus, syntactic analysis based on database of real corpus became into the mainstream of syntactic analysis. However, it is so complexity of Chinese sentences that syntactic analysis is difficult to improve performance through direct analysis based on real corpus. The paper is first divided Chinese sentences into several event description clauses (EDC) based on rules and predicate identification; then, parsed syntax of EDC based on data-oriented parsing (DOP); and finally achieved the full sentence parsing through a combination.The main contents of the paper are as follows:First, learning the syntactic based on data analysis parsing;Second,.construction of Corpus. As DOP is based on real corpus database, we need to construct relevant databases, the database of this experiment need to have Chinese question-types database, question-sentences database, interrogative words database, syntactic fragments database, syntactic cuts database and syntactic fragments combination database, etc.Third, proposed methods and steps of Chinese automatic syntactic analysis based on EDC. The paper is first divided Chinese sentences into several EDC based on rules and predicate identification in the pre-processing stage; then, parsed syntax of EDC based on data-oriented parsing (DOP); and finally achieved the full sentence parsing through a combination. The advantage of the method is the task of syntactic analysis can be step by step processing, and simplification of the complex sentence, so enhanced the speed and accuracy of syntactic analysis. Testing and training based on the task5 corpus of the CIPS-ParsEval-2009, the results of experiment gives F-measure of without-head match 82.78% and F-measure of complete-head match 65%. The results of experiment based on the LOC corpus of HIT gives a precision rate 94% in the closed test.
Keywords/Search Tags:Event description clause, Data-oriented parsing, Syntactic cuts, Syntactic fragments, Similarity calculates
PDF Full Text Request
Related items