A Research On Text Content Screening Method Based On Vector Space Model

Posted on:2019-03-30

Degree:Master

Type:Thesis

Country:China

Candidate:L G Cai

Full Text:PDF

GTID:2348330563954071

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the explosive growth of internet data,deep learning is close to our lives day by day.At the same time,we are increasingly relying on the Internet for consumption.But how to filter out worthless data from the Internet and tap into valuable information for us has slowly become a problem that needs to be solved.In this context,this paper carries out text classification research on network reviews that are often encountered in our daily life.In order to screen out non-meaningful comments made by Internet users and make it easier for us to access more useful information.This article focuses on the following aspects:First,in the shallow neural network model,this paper builds a text filtering model based on text vector and BP neural network.It can vectorize the different characteristics of the text in terms of word frequency and semantics,allowing the text vector to carry more text information and improve the accuracy of the text classification model.In this paper,a text synthesis vector construction method is improved,taking into account the characteristics of the text in terms of word frequency and the semantic characteristics of the text.And through the cross-experiment,it is proved that the method can improve the accuracy of the text classification model when the dimension of the text vector is as low as possible.Second,based on the shallow text classification model,according to the specific research content of this paper,I propose the concepts of "text value degree" and extend the vector based on the text value degree.At the same time according to the particularity of Chinese text,the concept of "text structure encoding" is proposed in the text similarity calculation.First,combining the text structure encoding with simple word frequency to calculate the text similarity,then calculating the text value degree according to the text similarity and the text sentiment tendency.Finally,extending the text vector based on the text value degree,and filtering texts using the extended text vector.It has been proved that the text value degree can improve the accuracy of the text classification model without affecting the efficiency of the model.Third,in the deep neural network model,this paper builds a text filtering model based on word vector and Long Short-Term Memory(LSTM)and improves LSTM using DAN and CNN respectively.The main improvement of the LSTM & DAN modelis to retain the original word vector information and combine the Dropout method to improve the accuracy of the text classification model without increasing the complexity of the LSTM model structure.The LSTM&CNN model mainly combines the advantages of the convolutional neural network in excavating the deep information of the text to improve the LSTM model,and it has been proved that this improvement is significant through experiments.Compared with the shallow neural network,the accuracy rate has been improved a lot.

Keywords/Search Tags:

text classification, Neural Networks, text value degree, LSTM model

PDF Full Text Request

Related items

1	Research On Text Classification Method Based On Bidirectional LSTM
2	Research On Short Text Classification Based On Deep Neural Network
3	Research On Chinese Text Classification Based On Hybrid Neural Network Model
4	Research On Chinese News Text Classification Based On Nested LSTM
5	Research And Implementation Of Chinese Long Text Classification Algorithm Based On Deep Learning
6	Research On Text Classification Based On Deep Learning
7	Research On Text Classification Based On Attention Bi-LSTM
8	Research On Automatic Text Classification Based On Machine Learning
9	The Research And Application Of Neural Network In Short Text Classification
10	Research And Application Of Text Classification Method Based On Price Complaint Data