Font Size: a A A

Research On Paraphrase Recognition Based On Traditional Features And Concepts Digital Features

Posted on:2015-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2268330428467671Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Paraphrase is a common phenomenon existing in natural language, the same semantics expressed in different ways. Paraphrase recognition aims at distinguishing whether the two given template language expressions express same meaning or not, its findings can be widely used in various fields of natural language processing, such as information retrieval, machine translation, automatic quiz. The universality of paraphrase in natural language and its wide application let paraphrase study particularly important.This paper analyzes the domestic and foreign research on paraphrase identification techniques, and finds existing methods mainly focus on the traditional features of sentence, such as treated the sentence as a string or symbol, or extracted lexical features or syntactic characteristics. These methods count on the traditional characteristics, ignoring the sentence as an information carrier itself has uncertainty with the accumulation of background knowledge, namely the uncertainty of knowledge. Previous methods based on traditional characteristics attempt to describe the continuous, changeable language in a discrete manner, ignoring the uncertainty and development of natural language. Paraphrase itself is a represent of uncertainty in natural language, it has semantic diversity, uncertainty and variability, those factors cannot be ignored in the paraphrase recognition. To solve the two problems that previous methods have been overlooked:(1) The concept as a whole has a semantic integrity and borders uncertainty,(2) Concept has semantic differences and ambiguities in the specific context. This paper extracts semantic features from both the traditional characteristics and concepts characteristics to identify paraphrase. The main contents of this paper include the following aspects:1Paraphrase recognition technology based on the multi traditional characteristics of the sentences. By studying the existing methods, the traditional characteristics of a particular aspect of the sentence are used to identify paraphrase fairly common. Taking into account the multi-faceted nature of the sentence, the paper proposed paraphrase recognition technology combining multi-level sentence feature. Firstly, take lexical analysis on a standard corpus of training corpus for sentence subject, predicate, object; Secondly, do syntactic analysis to get syntactic dependencies; then combine two levels of sentences features, training model for sentences similarity calculation; Finally, apply the training model to the standard test corpus.Compared with the previous methods, this method get outstanding on the recognition accuracy and F value.2Paraphrase recognition technology based on the concept digital characteristics deprive from the cloud model. Taking into account the natural language itself has change and uncertainty, the traditional characteristics cannot meet portray of the nature of language. Meanwhile, the quantitative study of the qualitative characteristics is not enough. To solve the vocabulary ambiguity problem of the whole word as a concept in the paraphrase recognition, this paper propose recognition method based on the concept digital characteristics. First, do word expansion on standard training corpus sentence, create the concept of in terms of the sentence and its associated word by cloud model theory, word groups of one sentence is converted into concepts; then identify paraphrase based on the concepts. This is the first time that cloud model be applied to lexical semantic representation in the study of paraphrase recognition. By the comparative experiment on standards corpus, results show that the concept characteristics perform better than traditional features in paraphrase recognition.3. Paraphrase recognition technology based on concept synthesis characteristics. In order to solve the knowledge representation issues as a whole object of sentence ambiguity and uncertainty problem, method based on the concept synthesis is proposed. The overall sentence will be treated as a concept; the fine-grained concepts will combined into coarse-grained concepts to represent the whole meaning of the sentence. Thus the uncertainty of the sentence is digging out. Finally, the characteristics of the sentence obtained are adopted to detect paraphrase. By addressing the ambiguity problem of sentence, the concept synthesis method jumped over the cloud model approach method, it also has some breakthrough compared with traditional methods.The advantages of paraphrase recognition based on concept characteristics proposed in this paper lie in three areas:(1) take into account both the meaning of the vocabulary entity and the extension knowledge of information, namely the concept uncertainty and integrity;(2) mining potential associations between sentence’s interior concepts, which represents the ambiguity of one sentence (3) the digital characteristics better reflect the robustness of natural language sentences than the traditional string surface features, lexical features, syntactic features.Paraphrase recognition has wide application in many areas in natural language processing; these areas are related to knowledge representation, knowledge evaluation, in which research is still not enough. Within the scope of what we know, this is the first application of the concept digital characteristics to identify paraphrase, which intended to tap the sentence semantic ambiguity or uncertainty. Experimental results show that the proposed method recognition rate on a standard corpus has some improvement. This article also indirectly proves the validity of the variability and uncertainty of knowledge. More importantly, the process can be seamlessly transferred to the most relevant research, facilitate other areas research.
Keywords/Search Tags:Sentence Pattern Recognition, Cloud Model, Numerical Characteristic, Concept Synthesis
PDF Full Text Request
Related items