In recent years,along with the popularity of the Internet,search engines,Internet chat,e-mail,forum reviews,e-commerce shopping,audio and video websites,micro-blogs,short-messaging mobile phones,and document documents have deeply influenced our lives.Internet short text.At present,the Internet company is the main force of short text mining.It actively develops short text classification technology and discovers the potential value of short text data data.It has very important research significance and great application value.At present,deep learning has been widely used in image recognition and speech recognition.Actual results show that various models in deep learning can solve problems more effectively.In order to further improve the short text classification effect,this paper focuses on the short text convolutional neural network model and its classification technology.In this paper,various text clustering and classification methods are elaborated,and the characteristics of short text and the problems to be studied are analyzed.Aiming at the steps of data preprocessing,Chinese word segmentation,feature extraction and clustering algorithm,a short text SkipGram vector generation method is proposed.Aiming at the problem of semantic relation and expression ability of words and words in the continuous low dimensional space,the Skip-Gram neural network model is used to train the word embedding method and to represent the distribution of the sample by combining the statement of word embedding into the form of a two-dimensional feature matrix.Type characteristics.Based on the depth learning technology,a semi supervised text clustering algorithm based on ShortTextCNN is proposed to generate the vector representation of short text by the convolution neural network,and the clustering effect is improved by the K-menas algorithm.Experimental analysis on Chinese data sets proves the effectiveness of ShortTextCNN+kmeans algorithm in short text clustering. |