| As the basis of natural language processing, sentence similarity has a wide range of applications, such as machine translation system, question and answering system and multi-document summarization system, etc. However, the complexity of the natural language causes that the progress of sentence similarity is not ideal. Current methods are rarely related to semantic information. In order to make better use of the semantic information, this thesis studies the word similarity and semantic role labeling:1. At present, the methods of word similarity calculation mostly use HowNet, but HowNet can’t contain all of the information and is not ideal in specific areas. In order to compensate for inadequate HowNet, this thesis attempts to mine semantic information from corpus and online knowledge base. We hope to find a more accurate calculation of the similarity between words.2. The features directly determine the performance of the machine learning model, hence how to find the valuable features from the syntax tree is crucial to enhance the accuracy of semantic role labeling, such as the common parent node, the node paths, etc. In this thesis, in addition to using the existed features proposed by other researchers, we put forward some more valuable new features.3. Finally, by mining the structure information of semantic roles, we propose an algorithm based on semantic role labeling to calculate the sentence similarity. At the same time, we design some experiments for performance evaluation of the proposed algorithm.This work was supported in part by the National Science Foundation of China (NSFC) under Grants71231002and61202247. |