Font Size: a A A

The Research Of Recognizing Textual Entailment

Posted on:2016-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WangFull Text:PDF
GTID:2298330467492495Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the arrival of the era of Big Data, the growth of daily data is faster and faster. At the same time, these data are also filled with a lot of useless and redundant information and natural language expressions the inherent diversity. Which make the computer understand the meaning of the text and collect data from large information becomes a very difficult but valuable thing. Recognizing Textual Entailment (RTE) is an effective approach for computer to identify semantic relation between texts automatically. RTE is a basic and important research Natural Language Processing which can be widely applied to many fields of Natural Language Processing andartificial intelligence, such as Machine Translation, Information Extraction, Machine Reading, Summarization and Information Retrieval.This paper presents an approach based on lexical, syntactic, semantic features. Firstly, preprocess the text, include normalize the characters, numbers, time, and unit. Secondly, do POS tagging, named entity recognition, coreference resolution, dependency analysis text processing of text, in addition to Chinese also conducted Chinese word segmentation, to English also conducted lemmatization and stemming and other operations. Thidly, in order to facilitate the extraction of semantic features, this paper constructs the equivalent word, antonyms, hyponymy word knowledge. Fourthly, extract the lexical, syntactic, semantic features. And then using Bayesianlogistic regression model for modeling the classification, and predict the identification results. Finally, using the rule set correction to get the final recognition results. For Enlish, the experimental results indicate that the method’s MacroF1on RITE-VAL data outperform competition optimal value (0.486, BKUTM;0.480, IKOMA). For Chinese, the experimental results indicate that the method’s MacroF1on RITE-VAL data is0.625, outperform optimal value (MacroF1:0.615, BUPTTeam-CS-SVBC-05).The main contributions of this paper are as follows:1. This paper proposes a algorithm based on a combination of machine learning and rule to the Recognizing Textual Entailment. We do related natural language processing, extract the features between the characteristics of the text, and use the machine learning and rules to identify the relationship between text, experiments show that this method is effective.2. This paper presents analgorithm based on the text between the IDF’s words. The formula takes full advantage of the weight of the words, text H, unknown words, can an objective response implies the existence of the relationship between the two sentences.3. This paper proposes a knowledge extraction method. In order to compensate for the lack of knowledge of Chinese, we made through the Internet and available resources to extract the equivalent word, antonyms, hyponymy and so on.4. Without the aid of an external knowledge, we propose a method of the expansion of the words according to the context where the source abbreviations approach. As the number of acronyms range, it is difficult simply by means of an external dictionary can be replaced correctly. Use text abbreviations context where replacement can find more accurate spelling acronyms, thereby increasing the similarity of the text.
Keywords/Search Tags:Recognizing Textual Entailment, Multi-feature, Knowledge Extraction, Machine Learning, Rule
PDF Full Text Request
Related items