| With the rapid development of Internet technology and the continuous improvement of information construction in China,the number of Chinese Internet users is increasing and a large amount of Chinese short text data appears on the Internet.As a basic task in natural language processing,sentence similarity calculation plays an important role in information retrieval,text classification,machine translation,intelligent customer service question and answer system,etc.Therefore,it has a very broad prospect and research value.In this paper,we study related technology and network model in Chinese sentence similarity calculation and deep learning,and complete the following work:Firs of all,this paper constructs an abundant Chinese sentence datasets,and performs lots of preprocessing work on these Chinese sentence datasets,including retaining some stop words,performing word segmentation,part-of-speech tagging,named entity recognition,dependency parsing,etc.Secondly,based on the classical neural network model,we improve and propose sentence similarity model for Chinese sentence similarity calculation,which combines the CNN and tensor layer and adopts the dynamic k-max pooling technology.Therefore,the model can extract the feature better and the interaction information between two sentences more effectively which improves the performance of the model.Thirdly,deep neural network is an effective method for sentence similarity calculation task but often requires substantial data to train to fully exploit the performance of the model,while open source Chinese datasets is less and the similarity of two sentences is expensive to label.In order to solve this problem,we improve the sentence similarity model,we design and implement Deep Assistance Neural Network(DANN)model.In this model,we utilize a large amount of unlabeled data to assist in training model parameters.The AdaDelta algorithm is used to optimize the stochastic gradient descent(SGD)method during training,which improves the quality of model training.Finally,we set up several comparative experiments to verify the performance of the models and the feasibility of the strategy in this paper.The experimental results show that compared with the best performing model MV-LSTM in the current baseline models,the sentence similarity model proposed in this paper has better performance in the Chinese sentence similarity calculation work,and improves the F1 value by 0.024.Moreover,quality of DANN model training is improved by the optimization of the AdaDelta algorithm,and the method of using a large amount of unlabeled data to assist the training of model parameters also effectively improves the performance of the model on small-scale annotated datasets.Compared with the sentence similarity model,the F1 value is improved by 0.023,and the F1 value of the model will be further improved as the amount of unlabeled data increases. |