| In the past,most of the traditional text similarity calculation methods were based on the surface text similarity calculation method,but they did not consider the semantic information contained in words in different contexts,and the calculation method lacked important semantics information.After that,computers use the external knowledge dictionary or external corpus for deep learning,which to some extent solves the above problems.But this method rarely considers the part of speech information or position information of words in the sentence or text and relies on external resources.The resources need to be pre-built and have a high complexity.At the same time,most of the past methods have verified their validity on English datasets,but not on Chinese datasets.In view of the differences in grammar and sentence structure between English and Chinese,algorithms that perform well on English datasets may not perform as well on Chinese datasets.In response to the above problems,the main work of this article is as follows: First,we propose a keyword similarity calculation method that improves YAKE and integrate deep learning.Keyword extraction techniques are integrated into text similarity calculations to extract important information about text in advance.By adding word span features,the YAKE algorithm can be better applied to keyword extraction in Chinese.The feasibility and effectiveness of the proposed method are proved by comparing the two sets of experiments with the traditional algorithm.Secondly,we propose a text similarity calculation method that integrates multi-level features.According to the different levels of the document,the weight value of the word is calculated through syntactic dependency analysis,semantic role identification and keyword score,so that the weight value contains semantic and position information.Then we calculate the weight of sentences and paragraphs according to the position result,logical structure and keyword overlap of sentences and paragraphs.The vector representation of documents is completed sequentially through different levels of weights,and then the similarity degree is calculated. |