| It is known that the Zhihu website is the most popular knowledge-based question and answer community in the China Internet,and there are about 70 million users sharing or looking for knowledge on this website.The basic function of Zhihu is to enable some users to post questions and other users to answer the questions.On the Zhihu website,the user who posts the question sets a few labels for each question,and then the user who wants to answer the question finds the user's question by the labels and responds the question.At present,the topic labels of the Zhihu website are annotated by users,which leads to bad experience for users.Specifically,it cannot recommend the appropriate answers to users timely and effectively since the labels annotated by users may be inaccurately.Furthermore,this method results to huge amount of human labor under condition of Zhihu's massive text data.Thus,designing a high-performance,high-precision multi-label automatic labeling system is significant to improve Zhihu website 's users experience and reduce its operating cost.This paper designs a multi-label automatic labeling model based on deep learning technology.The main work of this paper includes the following aspects:(1)This article designs and implements a Python web crawler to obtain large number of data from Zhihu website and preprocess of the acquired data,including data cleaning,text segmentation,word vector training using Word2Vec tools.(2)This paper implements a multi-label text classification model based on deep learning,including classification models based on CNN,LSTM,and CNN-LSTM.The optimal parameter settings for these models have been explored by a large number of experiments.The classification accuracy of these models was 96.39%,96.45%,and 96.99%.The hybrid model based on CNN-LSTM reduces the classification error rate of CNN and LSTM by 16.62%and 15.2%. |