Font Size: a A A

Research On Chinese Paraphrase Patterns And Collocations Extraction

Posted on:2014-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2268330398487865Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Paraphrases are alternative ways to convey the same information, which is a common phenomenon in natural language and has already been proved that paraphrase findings could be directly used in various domains of Natural Language Processing (NLP).This paper mainly focuses on Chinese paraphrase patterns and collocations acquisition. The acquisition not only can be applied directly to paraphrase generation but also be used for information retrieval, machine translation, question answering etc. other Natural Language Processing research.This paper firstly proposes a novel method based on subtitles to extract paraphrase patterns. The method is based on the fact that for different translations of the same foreign language resources are natural paraphrasing resources. Use different translations of the foreign language film of the subtitles match paraphrasing of the candidate sentence pairs and sentence length, the length ratio, the word overlaps rate, BLEU (Bilingual Evaluation Understudy) four filtering rules are chosen to filter the candidate sentence pairs. Then the patterns are extraction based on the filtering sentences,"sub-tree" and "partial sub-tree". In order to match these patterns, this paper introduced HowNet to calculate semantic similarity, and utilized the similarity of the words in the pattern to measure the semantic similarity between the two patterns. Compared to existed method, this method improves the precision and could be used to extract large amounts of paraphrase patterns.Paraphrasing patterns compared to ordinary phrases containing groove with variable, and therefore have more flexibility, but it’s only for partial sentences paraphrasing, cannot solve long-range problems. So, this paper proposes a method based on semantic fingerprint to extract paraphrase collocation. Collocations of forms of<V, OBJ, N>(verb-object collocations) and<N, SUB, V>(subject-predicate collocations) are extracted after syntactic analysis is done to the sentences. Then the words used in the collocations are expanded based on related words getting from concept semantic to get the candidate of paraphrase collocations. In order to filter these paraphrase collocations, following four features are chosen:part of speech feature, mutual information feature, HowNet-based semantic similarity feature, and context-based semantic similarity feature. Compared to existed method, this method doesn’t restrict the word in paraphrase collocation to synonym. The experiment shows that every feature exploited is useful for improving the performance.
Keywords/Search Tags:paraphrase patterns, paraphrase collocations, paraphrase evaluation
PDF Full Text Request
Related items