Font Size: a A A

Short Text Classification Based On The Model Of Knowledge Graph And Word Combination

Posted on:2022-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y J LiuFull Text:PDF
GTID:2518306752954309Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In the wake of the high-speed development of e-commerce transactions and instant messaging platforms,in the age of big data,the quantity of short text data on the network is growing with each passing day.Therefore,it is extremely necessary to extract precious knowledge from these enormous short text data.Short text classification is a foundation work in the field of Natural Language Processing(NLP).It learns deep semantic representation from short text data,and then applies it to text classification tasks.The length of this kind of text is usually short,contains less effective information and the colloquialization of the text is serious,so it is highly irregular.Therefore,in view of the scarcity and non-standard characteristics of short texts,this paper discusses the problem of short text classification from two aspects:expanding the length of short text and deeply mining the characteristics of short text.The main research works are as below:(1)Extending short text through knowledge graph.In order to solve the problems of short text with short length,less effective information and sparse features,this paper proposes to use knowledge graph to extend short text.After cleaning of short text data,this paper uses the Text Rank algorithm to get a short text keyword set.In the knowledge graph,the words in the keyword set are queried in turn,and the entity disambiguation of the query results is carried out,and then the final result is taken as the extended text.Finally,the word set of the original text is spliced with the extended text to get the extended text based on the knowledge graph.At the same time,this paper uses the Word2 Vec tool to embed the extension text,and uses common text classification model Text CNN to classify it,verifies the performance of this method.The experimental results show that the classification accuracy after the expansion of the short text is improved by using the knowledge graph.(2)Using the BERT model to improve the word polysemous problem.The Word2 Vec tool represents the words as a static vector of a unified dimension,which is unable to dynamically express the different meaning of the words.This paper uses the Bert model to improve this problem.At the same time,based on the BERT model,CNN,RNN and RCNN are used to extract deeper semantic features from the short text.The experimental results show that using the Bert model can effectively improve the polysemous problem and improve the accuracy of the short text classification.(3)Proposing a deep learning classification model based on the combination of characters and words and multi-head attention mechanism: Char?Word?RMCNN.In view of the lack of semantic expression ability of BERT model in vector representation with a single character as a unit,this paper proposes a deep learning classification model based on the combination of characters and words and multi-head attention mechanism.The model uses Word2 Vec tool and BERT model to represent short text,and combines the advantages of both models to extract the joint semantics of short text.After that,the deep learning network model combined with multi-head attention mechanism is used to aggregate the semantic features extracted by two word-vector models to get the deep features of the text.The experimental results show that the accuracy of the category to the short text using the Char?Word?RMCNN model is highly improved.To sum up,this paper uses the knowledge graph to extend the short text,and uses the BERT model to improve the word polysemous problem.Finally,in view of the lack of semantic expression ability of the BERT model in vector representation with a single character as a unit,the Char?Word?RMCNN model is proposed to extract the deep features of the short text.Compared with directly using Text CNN model to classify Weibo dataset and Toutiao dataset,the knowledge graph is used to expand the two datasets,and the classification accuracy of the deep learning classification model based on word combination and multi-head attention mechanism is improved by 0.2227 and0.1422 F1 scores by 0.2213 and 0.1445 respectively.
Keywords/Search Tags:Short Text Classification, Text Extension, Knowledge Graph, MultiHead Attention Mechanism, BERT Model
PDF Full Text Request
Related items