Font Size: a A A

Research On App Classification Based On Word Embedding And Topic Model

Posted on:2021-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y M HanFull Text:PDF
GTID:2518306311496084Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,network has become an important part of people's daily life.With the emergence of smart devices such as smart phones,mobile terminals become an important interface of the Internet.Nowadays,more and more Internet users choose to access the Internet through the mobile terminal.Through the mobile Internet,people can get the information they want at any time,and get rid of the limitations of time and place.If mobile phone is the access carrier of mobile network,then app is an important medium to obtain information,and various applications have become an important carrier of people's network behavior.Through various applications,people can realize browsing news,reading book,chat and other social functions.With the emergence of more and more applications,the classification task of APP has become an important problem.This paper proposes a framework to solve the task of APP classification.The app classification system proposed in this paper is based on the description text of app,which can be seem as text classification.In the past research,the method of deep learning performs better in text classification.Deep learning can make multiple nonlinear transformations on the original input.So it can capture the potential information of words by building a complex structure model.The most widely used method in text classification of deep learning is the convolutional neural network(CNN).Because of its special structure,it has made an important contribution to the task of text classification.APP description information contains different types of text,which has different length.The method of short text classification is more suitable.Short text contains less information,and has the characteristics of sparseness.Therefore,this paper proposes a method of combining word vector and topic model,which can capture the information based on word dimension on the one hand and the global semantic information of text on the other hand.Through the combination of the two kinds of information,we can get more semantic features of the text.At the same time,in order to expand the features of the short text,and realize the maximum capture of the semantic features of the text,we proposed a method based on calculating the weight of the word.Finally,by introducing TF-IWF weight,the feature representation of text is further improved.We use all text features we got from topic model and word embedding and have a good performance.
Keywords/Search Tags:Deep Learning, CNN, Word embedding, Topic model
PDF Full Text Request
Related items