Research On E-Commerce Commodity Title Category Classification Algorithm Based On Natural Language Processing Technology

Posted on:2023-06-22

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Yan

Full Text:PDF

GTID:2568306851490904

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

E-commerce live broadcasting and delivery is a newly emerging industry in China in recent years.It has greatly promoted the development of the national economy,brought great convenience to the people,and also provided a more conducive platform for businesses.However,this new industry is developing very rapidly.In the face of huge benefits,the platform ignores the shortcomings in data processing,that is,the automatic processing of data.Therefore,it is still in the manual processing stage in this field,which greatly reduces the efficiency of data processing.Therefore,based on this,this thesis studies the commodity Title category classification generated by the e-commerce live broadcast industry.The contents are as follows:(1)By using the web crawler technology,the product information of Taobao and Tiktok’s top ten anchor stations in 2021 was captured.A total of 31237 pieces of data were obtained.The data set identified four categories of product Titles: clothing,beauty,life and food,including 10035 items of "clothing" category,8954 items of "beauty" category,6845 items of "life" category and 5403 items of "food" category.According to the total data volume,it is randomly divided into three data sets with different proportion of training set and test set: data sets I,II and III.The data volume is 10000,10000 and 11237 respectively,and the proportion of training set and test set is 60% and 40%,70% and 30%,80% and 20%respectively.(2)For the data,Jieba tool is used for preprocessing,that is,denoising and word segmentation.Due to the characteristics of the data itself,there is no need to stop the operation.Then TF-IDF and word2 vec word vector models are used to extract text features respectively,and the output results are input into machine learning model and LSTM self attention hybrid model for classification.(3)Four machine learning algorithm models of decision tree,random forest,naive Bayes and xgboost are used to classify goods.Through analysis and calculation,the classification accuracy of data set II is higher than that of the other two data sets,and xgboost has the best effect,with an accuracy of 90.89%.(4)The LSTM self attention hybrid model is constructed.The model is composed of three layers: self attention weighting layer,long-term and short-term memory network classification layer and softmax normalization processing layer.The self attention weighting layer gives attention weight to the word vector input by word2 vec model,and then inputs it into the long-term and short-term memory network classification layer for classification,Finally,through the normalization processing of softmax layer,the final category of commodity Title Classification is obtained.Compared with the machine learning algorithm model,the LSTM self attention hybrid model has better classification effect,and the classification accuracy for dataset II is 92.09%.

Keywords/Search Tags:

Natural Language Processing, Short text classification, Machine learning, TF-IDF word vector, Word2vec word vector, LSTM self attention model

PDF Full Text Request

Related items

1	Improvement And Application Of Text Classification Based On RNN
2	Natural Language Processing-A Study Of Vectorization Of Chinese Words And Short Texts
3	Research On Short Text Classification Based On Deep Neural Network
4	Research On Chinese Short Text Classification Based On Word Embedding
5	Research On Short Text Classification Based On Deep Learning And BTM Model
6	Research On Text Classification Based On Word Vector And Deep Learning
7	Study On Chinese Named Entity Recognition
8	Research On Text Classification Method Based On Bidirectional LSTM
9	The Research On Measuring Text Similarity Based On Word Vector Enhanced Tree Kernel Model
10	Research On Text Classification Based On Natural Language Processing And Machine Learning