Font Size: a A A

Chinese Multiword Expression Extraction And Application On Chinese Dependency Parsing

Posted on:2016-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z P YangFull Text:PDF
GTID:2308330464965104Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Multiword Expression (MWE) is a common phenomenon in the field of Natural Language Processing (NLP). MWE refers to expressions with relatively complete semantic units, which consist of two or more words. But the syntactic and semantic association of MWE cannot be derived from their component words. MWE is one of the most intractable problems in the field of Natural Language Processing. If it can not be deal well, it will affect the performance in the application field of NLP. This paper takes dependency parsing as an example to explore the effect of MWE to dependency parsing. Then two general (N+VN and VN+N) MWE Knowledge Base has been constructed to improve the performance of dependency parsing. In details, the main content of this paper includes:1. Extracting Multiword Expression candidates. "N+VN" and "VN+N" MWEs are two of the most common MWE categories in Chinese MWEs. This two classes also has a higher error rate in processing. Therefore, this thesis mainly treats Chinese "N+VN" and "VN+N" MWE as the research objects. In my experiment, after extracting 2000 MWE candidates from the People’s Daily corpus of the first half year of 1998, we get the accuracy of 71.85% in "N+VN" MWE and 68.2% in "VN+N" MWE.2. Corpus correction. Our research point out that there are still annotation errors in the People’s daily corpus, which will downgrade the accuracy of extraction. We aim to find tagging errors in the original corpus and modify it based on the statistical analysis of the MWE extraction result. Correction of the corpus is mainly in two aspects:the first aspect is the correction of the MWE which contains the single word, the second aspect is the tagging correction of the MWE. In total,9979 annotation errors have been corrected. After the correction, the accuracy surge to 81.05% in "N+VN" MWE and 77.2% in "VN+N" MWE. This thesis finished the tagging correction in the People’s Daily corpus of the first half year of 1998, improved the quality of the corpus. There is positive significance on the research in the filed of NLP.3. Constructing the Chinese MWE Knowledge Base. Extracting and analyzing MWE candidates in the corrected corpus, through MWE classification experiments I construct the Chinese "N+VN" and "VN+N" MWE Knowledge Base.4. Chinese Dependency Parsing based on Chinese MWE Knowledge Base. According to the Chinese MWE Knowledge Base, extracting MWE candidates in the results of dependency parsing. Then I find the dependency tagging errors in the results of dependency parsing. Correcting dependency parsing errors base on the statistical analysis of the extraction results, the accuracy of dependency parser on the wrong sentences is improved by 2.1%, the accuracy of whole dependency parser is improved by 0.32%. This result proved that the Chinese MWE Knowledge Base is useful for dependency parsing.
Keywords/Search Tags:Multiword Expression, MWE Extraction, MWE Classification, Dependency Parsing, Dependency Parsing Error Correction
PDF Full Text Request
Related items