| In recent years,with the rapid development of big data technology,many companies pay attention to the research and application of data science,and use big data analysis technology to perform statistics and analysis on the company’s business.Among the related technolo-gies of big data analysis,sentiment analysis of text is a very popular research direction.At present,text sentiment analysis is mainly based on deep learning methods.Recently,a large number of scholars have used the Attention mechanism in the field of NLP.The Bert model based on the Attention mechanism proposed by Google in 2018 has refreshed the record of various tasks in NLP.BERT is not only an excellent model by itself,but also can be combined with other algorithm models to achieve better results.Before the Bert model was proposed,most sentiment analysis tasks based on deep learning used Word2Vec.Although Word2Vec can extract text features more effectively than traditional machine learning meth-ods,it still has limitations.After the Bert model was proposed,the research on sentiment analysis was gradually replaced from Word2Vec to Bert.Compared with the previous mod-els,the hybrid models proposed by predecessors that use Bert have a greater improvement,but there is still room for improvement in sentiment analysis tasks.First,after Bert pro-posed,some studies have optimized the Bert model,and improved models such as Roberta and Albert have been proposed to further improve the performance of the model.Second,the construction of the neural network for downstream tasks also has a significant impact on the accuracy.Improving this part of the neural network can also improve the accuracy.When constructing a downstream neural network,it is necessary to determine the optimal network structure through experiments.This thesis conducts comparative experiments on the e-commerce data set to maximize the classification effect of the neural network model constructed in this thesis on the data set.The Albert-LSTM-ATT-RCNN hybrid model proposed in this thesis is designed as a typ-ical structure of text classification.First,the text is converted into feature vectors,and then classified by the fully connected layer.The step of converting text into feature vectors is the core of the hybrid model.This step first converts each word in the text into a word vector.This thesis uses the latest Albert model to replace the Word2Vec model.The downstream neural network of the hybrid model is based on the BiLSTM model,and integrates the RCNN layer and the Attention layer to further extract sequence features,so that the hybrid model achieves the optimal classification effect.In this thesis,an emoji processing solution is designed,based on the Albert-BiLSTM-ATT-RCNN hybrid model,that is,comment text uses the pre-training language model,and emoji features are learned by random initialization.The solution can increase the training of emojies without changing the structure of the original hybrid model.This solution can be used to analyze the sentiment of comment text containing emojis more effectively.The Albert-LSTM-ATT-RCNN hybrid model proposed in this thesis is designed as a typ-ical text classification task structure.First,the text is converted into feature vectors,and then classified by the fully connected layer.The step of converting text into feature vectors is the core of the hybrid model.In this step,each word in the text is first converted into a vector.The ability of the model used in this process to extract text features will have a very large impact on the result of the entire hybrid model.Therefore,the hybrid model proposed in this thesis uses the latest Albert model to replace the Word2Vec model.Among them,Albert is an improved version of the Bert model.For the design of the downstream neural network model,the most suitable neural network is constructed according to the characteristics of the text of the sentiment analysis task.The downstream neural network of this hybrid model is based on the BiLSTM model,which is the most commonly used text classification task,and integrates the RCNN layer and the Attention layer to further extract sequence features,so that the hybrid model achieves the optimal classification effect.This thesis uses the product review data of Jingdong and Tmall platforms crawled by crawlers as the data set.The original comment data crawled by the crawler is added to the data set after preprocessing and manual annotation.The hybrid model proposed in this thesis is experimented on this data set.First,the hyperparameter settings of the hybrid model are de-termined through comparative experiments with different hyperparameter settings,and then the hybrid model is compared with the most commonly used models in recent years including Word2Vec-SVM,Word2Vec-RCNN.Word2Vec-BiLSTM-ATT and the single model that make up the hybrid model including Albert-RCNN and Albert-BiLSTM-ATT on the same data set.The experiment result show that the hybrid model has the highest accuracy rate,reaching 89.89%.Compared with the model using Word2vec,the accuracy rate is improved by more than 5%;compared with the Albert model,the hybrid model improves the accuracy by 2.30%,which verifies that adding a neural network to the downstream task of the Albert model to further extract features can get a higher accuracy.Compared with Albert-RCNN and Albert-BiLSTM-ATT,which are part of the hybrid model,the hybrid model improves the accuracy by 1.56%and 1.43%,respectively,verifying that the combined use of the RCNN layer and the Attention layer can get improvement.Then come the experiment to verify the emoji processing solution.The experimental re-sults show that emojis participating in training can increase the accuracy rate by 1.9%,which is effective.Since the Bert pre-training model only includes the processing of words,for the e-commerce reviews,such as the network text containing emojis,adding the processing of emojis on the basis of the original hybrid model can improve the accuracy of sentiment anal-ysis to a certain extent. |