Font Size: a A A

A Research On Text Content Screening Method Based On Vector Space Model

Posted on:2019-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:L G CaiFull Text:PDF
GTID:2348330563954071Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the explosive growth of internet data,deep learning is close to our lives day by day.At the same time,we are increasingly relying on the Internet for consumption.But how to filter out worthless data from the Internet and tap into valuable information for us has slowly become a problem that needs to be solved.In this context,this paper carries out text classification research on network reviews that are often encountered in our daily life.In order to screen out non-meaningful comments made by Internet users and make it easier for us to access more useful information.This article focuses on the following aspects:First,in the shallow neural network model,this paper builds a text filtering model based on text vector and BP neural network.It can vectorize the different characteristics of the text in terms of word frequency and semantics,allowing the text vector to carry more text information and improve the accuracy of the text classification model.In this paper,a text synthesis vector construction method is improved,taking into account the characteristics of the text in terms of word frequency and the semantic characteristics of the text.And through the cross-experiment,it is proved that the method can improve the accuracy of the text classification model when the dimension of the text vector is as low as possible.Second,based on the shallow text classification model,according to the specific research content of this paper,I propose the concepts of "text value degree" and extend the vector based on the text value degree.At the same time according to the particularity of Chinese text,the concept of "text structure encoding" is proposed in the text similarity calculation.First,combining the text structure encoding with simple word frequency to calculate the text similarity,then calculating the text value degree according to the text similarity and the text sentiment tendency.Finally,extending the text vector based on the text value degree,and filtering texts using the extended text vector.It has been proved that the text value degree can improve the accuracy of the text classification model without affecting the efficiency of the model.Third,in the deep neural network model,this paper builds a text filtering model based on word vector and Long Short-Term Memory(LSTM)and improves LSTM using DAN and CNN respectively.The main improvement of the LSTM & DAN modelis to retain the original word vector information and combine the Dropout method to improve the accuracy of the text classification model without increasing the complexity of the LSTM model structure.The LSTM&CNN model mainly combines the advantages of the convolutional neural network in excavating the deep information of the text to improve the LSTM model,and it has been proved that this improvement is significant through experiments.Compared with the shallow neural network,the accuracy rate has been improved a lot.
Keywords/Search Tags:text classification, Neural Networks, text value degree, LSTM model
PDF Full Text Request
Related items