Font Size: a A A

Research And Application Of Multi-scene Short Text Classification Based On Deep Learning

Posted on:2019-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:S L YuanFull Text:PDF
GTID:2348330569495543Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of internet technology,humans have begun to enter the era of big data.Data mining and artificial intelligence have become the focus of the times.In the fields of science,economy and social life,massive data are presented.The document data contains human behaviors and psychological information.If we carefully dig it out and use it scientifically,it will have a very high social value.The primary task in text data mining and processing is text classification.After years of research and development,the text classification algorithm has achieved a lot of achievements.And they are widely used in various industries,such as article classification,spam recognition,and sentiment classification.However,under different scenarios how to construct a text classifier quickly and efficiently becomes a major challenge.We use deep learning related technologies to solve the problem of short text classification under multiple scenarios.The main work of this thesis is as follows:(1)Firstly,the traditional text classification related technologies are studied.Its main drawback is that the classification result is too dependent on the early tedious feature selection work.So,we turn to research on deep learning related technologies.Then the word embedding model and a variety of neural network models are studied in detail.The neural network model includes recurrent neural network,convolution neural network and attention model.And we combine related technologies to build a text classifier.(2)The data collected on the Internet is cleaned,processed,and annotated to build multi-scenario datasets.(fields,classification difficulty,and data volume scales).With different datasets,the relationship between various word embedding models(NNLM/CBOW/Skip-gram,etc.)and text classification models(RNN-Text/CNNText/RNN-Attention,etc.)was analyzed in detail.And the influence of word embedding on the classification model was evaluated according to the performance gain rate PR.(3)With different scenarios,some ticks and parameter optimization methods in model training are summarized to improve classification accuracy.The performance and application scope of different classification models under multiple scenarios are analyzed in detail.(4)In this thesis,facing the problem of unbalanced datasets in different scenarios,we introduce the parameter k to optimize and improve the loss function of the classification model.The results show that the method can improve the accuracy effectively.Finally,the relationship between the value of the super parameter K and the accuracy is analyzed.
Keywords/Search Tags:Multi-scene, Deep Learning, Short Text Classification, Attention Model, Unbalanced Data Set Optimization
PDF Full Text Request
Related items