Research On Short Texts Classification Methods Based On Features Fusion And BiLSTM

Posted on:2020-07-03

Degree:Master

Type:Thesis

Country:China

Candidate:W H Li

Full Text:PDF

GTID:2428330590456744

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In the age of information,there are so many different kinds of short messages including mobile phone texts,spams,messages of recommendation systems,and evaluations producted by shopping platforms.Extracting the required effective information from the ocean of messages is a major puzzle and challenge faced by human beings promptly and accurately;it is a hot topic researching fast,flexible,efficient and low-cost texts information extracting methods in the field of natural language processing,which helps to improve the quality and speed of users accessing to effective information,and meeting the needs of different departments for news classification,sentiment classification,and public opinion analysis.At present,the short texts classification methods mainly includes machine learning and deep learning.A features selection function FS fusing multi-factors is constructed,it is verified that FS is not only integrated the semantics of the features,but also can remove a large number of redundant features,improve the weight of the features with distinction capabilities,comparing with the traditional features selection function TF-IDF;FS as a new function,using the Chinese corpus of Sogou Lab for short texts classification,verifying the effectiveness of the method.The deep learning models are interferenced easily,resulting in poor performance on classification,a multi-levels method combining attention and adversarial training was proposed based on the bidirectional Long Short-Term Memory,the input layer included the words embedding and the perturbed words embedding,the perturbation increased the parameters updating in the training process by making small changes on input for the model,the bi LSTM layer can extract the semantic information of different distances in the context,the attention layer transformed the weights of the data encoded by bi LSTM enhancing the serialization learning task;finally,the softmax minimized the error of loss and classified the short texts corpus;doing experiments on the data(DBpedia),compared with the fine models(Attention-LSTM),(Attention-bi LSTM)and(CNN-LSTM),the multi-levels deep learning model has better classification performance,stronger stability and generalization ability,the predicted accuracy is 97% on classification,the loss function value was stable at 0.5% or so.

Keywords/Search Tags:

short texts classification, features selection, Word2vec, adversarial training, LSTM, attention

PDF Full Text Request

Related items

1	Research On Sentiment Classification For Chinese Online Comment Texts Based On Word2vec And SVMperf
2	Research On Improved TF-IDF Feature Selection And Short Text Classification Algorithm
3	Research On Text Classification Method Based On Bidirectional LSTM
4	Research On Robust Speaker Features Based On Domain-adversarial Training And Attention
5	Research On Deceptive Reviews Detection For Multiple Domains
6	Short Text Modeling Based On Two-level Attention Networks For Sentiment Classification
7	Research On Text Sentiment Analysis Based On Adversarial Training
8	Text Classification Based On Semi-supervised Learning
9	Research On Emotion Classification Of Chinese Short Texts Based On Deep Learning
10	Research On Emotion Classification Of Network Short Texts Based On Deep Neural Network