| In today’s society,changes in consumer concepts and improvements in living standards have increased the requirements for automobile quality.At the same time,with the advent of the era of Internet big data,more and more users will browse other people’s comments to support their purchase decisions,and manufacturers or media also use the Internet to learn about user needs and promote their product.In order to make good use of the user comment,make the value of the data benefit the stakeholders in the automotive industry,and build a good automobile ecological environment,I have identified the themes and emotions of user review texts in the automotive industry,constructed an LDA theme analysis model,and studied automotive product factors that affect consumer satisfaction.Based on this,this thesis mainly completed the following work:First,I introduce the background and significance of the research question,review the theoretical research of text classification and sentiment analysis in the automotive industry,and describe the model methods involved in this thesis,including three text feature representation methods(TF method,TF-IDF method and Word2 vec method),LDA Topic analysis,sentiment dictionary and four machine learning classification methods(logistic regression,SVM,naive Bayes and XGBoost).Second,I focus on the automotive industry,perform descriptive analysis on the review data,explain the sample distribution and the sample word distribution,perform preprocessing(data cleaning,text segmentation,de-stop words,part-of-speech tagging,and feature representation),and explain the high-frequency vocabulary.Finally,I use classification models to identify topics and emotional tendencies separately.By comparing the accuracy score,F1 score and log loss,it is found that the performance of the XGBoost model is better than other models and the performance of logistic regression,SVM and naive Bayes is more stable under the TF method and the TF-IDF method.Then I use the LDA theme model to summarize keywords,and obtain two categories of keywords related to automobile equipment and user experience.By comparing positive and negative keywords,suggestions are given in product development and product marketing. |