Font Size: a A A

Research On Short Texts Classification Methods Based On Features Fusion And BiLSTM

Posted on:2020-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:W H LiFull Text:PDF
GTID:2428330590456744Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the age of information,there are so many different kinds of short messages including mobile phone texts,spams,messages of recommendation systems,and evaluations producted by shopping platforms.Extracting the required effective information from the ocean of messages is a major puzzle and challenge faced by human beings promptly and accurately;it is a hot topic researching fast,flexible,efficient and low-cost texts information extracting methods in the field of natural language processing,which helps to improve the quality and speed of users accessing to effective information,and meeting the needs of different departments for news classification,sentiment classification,and public opinion analysis.At present,the short texts classification methods mainly includes machine learning and deep learning.A features selection function FS fusing multi-factors is constructed,it is verified that FS is not only integrated the semantics of the features,but also can remove a large number of redundant features,improve the weight of the features with distinction capabilities,comparing with the traditional features selection function TF-IDF;FS as a new function,using the Chinese corpus of Sogou Lab for short texts classification,verifying the effectiveness of the method.The deep learning models are interferenced easily,resulting in poor performance on classification,a multi-levels method combining attention and adversarial training was proposed based on the bidirectional Long Short-Term Memory,the input layer included the words embedding and the perturbed words embedding,the perturbation increased the parameters updating in the training process by making small changes on input for the model,the bi LSTM layer can extract the semantic information of different distances in the context,the attention layer transformed the weights of the data encoded by bi LSTM enhancing the serialization learning task;finally,the softmax minimized the error of loss and classified the short texts corpus;doing experiments on the data(DBpedia),compared with the fine models(Attention-LSTM),(Attention-bi LSTM)and(CNN-LSTM),the multi-levels deep learning model has better classification performance,stronger stability and generalization ability,the predicted accuracy is 97% on classification,the loss function value was stable at 0.5% or so.
Keywords/Search Tags:short texts classification, features selection, Word2vec, adversarial training, LSTM, attention
PDF Full Text Request
Related items