Font Size: a A A

Research On Deep Learning Text Classification Method Based On HowNet

Posted on:2022-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y NieFull Text:PDF
GTID:2518306773981419Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the continuous progress and development of science and technology,more and more text data have been produced.The processing of text data is the general trend.In recent years,text data classification technology has developed rapidly.In the face of the increase of massive news data,public opinion data and other text data,text classification methods are also innovating.However,there are great differences in the structure of different text data,and there will be loss and semantic incompleteness in massive text data.Text data has the characteristics of unclear semantic expression,high dimension and sparse data content.The traditional classification methods often do not consider the semantic accuracy.Therefore,different text classification methods should be used for different text data information,so text classification has always been one of the hot issues in the field of natural language processing.Aiming at the shortcomings of the current methods,this paper proposes a deep learning text classification method based on How Net(DL-TC-HN).Firstly,the semantic classification is carried out through the two-way LSTM neural network with attention mechanism in deep learning,and then the text with sparse feature words is sent to the knowledge base for expansion,and spliced through the How Net semantic similarity calculation method,Combined with the topic model,it is finally classified by classifier.The main research work of this paper is as follows:(1)Prevent high dimension and large amount of calculation of text data.This paper uses How Net based semantic similarity calculation algorithm to calculate the similarity of feature word vector.The text is preprocessed through the Bert model and calculated in the vector dimension.By considering the spatial structure and semantic structure of the feature vector,the accuracy of similarity calculation is increased,and the data that does not meet the threshold conditions of spatial structure in the calculation process is eliminated,so as to reduce the operation time and improve the calculation efficiency.Through the data set of Stanford reasoning corpus,the semantic similarity calculation algorithm based on How Net is compared with a variety of classical algorithms in terms of efficiency and calculation accuracy,which proves the effectiveness of this method.(2)In view of the fact that the traditional text classification does not consider the semantic influence,this paper proposes to use the two-way LSTM model with attention in deep learning to fully extract the text data at the semantic level.The parameters of each layer of neural network are obtained through training,and finally a more accurate text semantic feature word vector is obtained.In view of the sparse text data and the incomplete feature words,the CN DBpedia knowledge base is cited to obtain the relationship between entities through the triples of the knowledge base,so as to expand the feature relationship.Through the threshold of the knowledge base,the entity relationship that finally meets the conditions is determined,so as to expand the semantics.Based on the above process,the results are finally sent to the classifier with BTM topic model for text classification.This process effectively avoids the deviation in the calculation process and the error caused by the incomplete model structure,and makes the final classification result more accurate.Through four data sets,the text classification method based on deep learning is compared with a variety of classical algorithms in terms of efficiency and computational accuracy,which proves the effectiveness of this method.
Keywords/Search Tags:text classification, deep learning, topic model, HowNet
PDF Full Text Request
Related items