Font Size: a A A

Research On Comment Classifiation Based On Multi-dimensional Features

Posted on:2018-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y J BaoFull Text:PDF
GTID:2348330512483411Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rising E-commerce transactions,the interaction data between user and product is booming fast.Users and businesses start to obtain product feedback through comments and make decision.Traditional comment classification algorithms based on bag-of-words model or TF-IDF feature do not take into account the semantics,grammar and word order of the comments,and ignore the characteristics of the user and the comment itself either,which leads to limited accuracy and poor scalability.Therefore,this paper investigates the text representation,text classification and comment classification technologies,propose and implement a novel comment classification model named as MDF-CC based on multi-dimensional features.I also did experiments based on the comment data set collected from JD.com to compare MDF-CC model with the classification model based on TF-IDF,and fasttext classification model based on text feature.The experimental results showed the accuracy and rationality of the proposed classification model based on multi-dimensional feature.The main contributions of this thesis are concluded as follows:1)Study the main methods of text representation and the algorithms of text classification,and analyze the advantages and disadvantages of each technique.Comment classification features were extracted based on the traditional TF-IDF statistical method,and the random forest and SVM model are implemented to apply on the comment data set.The F1 values of the two models are about 79%and 80%respectively.2)Implement a fast comment classification model F-CC based on fasttext.Considering the semantics of words,grammar and word order information,the F-CC model is established based on word vector network training method.The Fl value of comments classification method F-CC is about 88%,which is better than the traditional TF-IDF feature classification method.3)Propose a novel comment classification model MDF-CC based on multi-dimensional features.Based on the textual feature,the fasttext probability model is established.We visualize the relationship between non-textual features and comment polarities,and the random forest probability model is established based on non-textual features.Finally the two probability models are linearly merged to obtain the classification model based on multi-dimensional feature.The model is based on both the text feature and the non-textual feature.Experimental results show that the F1 value of the multi-dimensional classification model is about 90%,which verifies the accuracy and scalability of MDF-CC proposed in this paper.4)Propose a method of comment aspect extraction approach based on word vector similarity matching.By comment aspect extraction and average similarity matching,the aspect of logistics,service,price and quality is judged,then get favorable rate and negative feedback rate of products.
Keywords/Search Tags:comment classification, fasttext, SVM, text features, multi-dimensional features
PDF Full Text Request
Related items