Font Size: a A A

Research And Application Of Chinese Short Text Classification Algorithm Based On Deep Learning

Posted on:2022-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:X YangFull Text:PDF
GTID:2518306332487874Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The widely application of internet technology has continuously prompted the Chinese Internet users and Internet penetration improvement,creating a large amount of online text data and quantity increasing exponentially.It is important to search for effective information from such a large text data needed by various industries for their management and service such as medical consulting services,hotels,theaters,enterprise management,and other industry.Thus,the important subject of automatic text classification is derived.Its critical links include the vectorized representation of texts and the training of categorization models.Thereinto,the former can be divided into space-based vectorized and distributed representations.Its traditional methods,such as one-hot,TF-IDF and vector space modal(VSM),may result in the explosion and sparseness of dimensions.In addition,the words cannot be expressed clearly,which affects classification effect.Now,a variety of improved algorithms based on word2vec,including skip-gram and CBOW,have been used.As for model training,the traditional manual classification and word frequency categorization have become increasingly efficient due to machine learning methods,which contributes to the categorization of large-scale text datasets.Since the natural language is introduced to deep learning from image recognition for task processing,its strong learning ability greatly improves the accuracy of classification.In this paper,an integrated neural network(ARCNN)model is proposed on the basis of distributed vectorized representation and convolutional neural network.The specific research is shown below.(1)Based on skip-gram algorithm,the negative sampling is employed for distributed represented training process to improve the convergence rate of models.Then,the ARCNN model is designed based on CNN.Before entering convolutional layer,the contextual information is extracted in forward and reverse directions through the bidirectional long short-term memory(Bi LSTM),which results in long-term dependence.After the feature vectors are extracted from the convolutional layer,maximum pooling method in pooling layer is substituted by attention mechanism.Thus,these vectors will be attached with some parameters to highlight the key information.Finally,the ARCNN model and three control groups are compared in double-label emotional text database and multi-label news text database.This paper is involved with three models:the traditional machine learning model,the simple neural network model,and the ARCNN model.As shown by experiment,the neural network model is better than machine learning one.Meanwhile,the ARCNN model integrates the strengths of three models and achieves a better categorization effect in both datasets.(2)The semantic relation is added based on the ARCNN to design a two-channel integrated neural network model(SAARCNN).The ARCNN model is used for general domain channel so that the firstN1 composite and the firstN2 sub-features can be selected from a specific domain channel by binary operation and feature selection approach?2??2*def(tij) to output the set of characteristics.According to vectorized representation in a specific domain,the Text CNN is selected as classification model.Thus,the two channels can be connected in series in the fully connected layer.At last,the SAARCNN model is verified on datasets of medical domain,and the results show that F1value has been increased of each category of text set in a specific domain.(3)The SAARCNN model has been applied to the text categorization system for hospital network consulting.Thus,the questions posed by patients on hospital website can be classification automatically,which is of practical significance.Introduces the model of the system of training process and automatic classification of two major processes,the text for the two processes need collection module,text preprocessing module,feature selection and vectorization representation module,model training module,automatic classification service module and so on five big modules in detail elaborated the principle and the corresponding process of its implementation,and the application effect of system is verified.
Keywords/Search Tags:Chinese Short Text Classification, Vectorized Representation, Convolutional Neural Network, Feature Selection, Attention Mechanism
PDF Full Text Request
Related items