Font Size: a A A

Short Text Clustering Based On Ensemble Learning

Posted on:2020-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:X J WangFull Text:PDF
GTID:2439330590982855Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,online shopping gets the favor of the masses increasingly,and becomes the mainstream of people's daily shopping way.At the same time,online shopping platform produced huge amounts of the comment on commodity text data.Product reviews contain a large number of information about potential product and customers' satisfaction,from which enterprises can dig out the main characteristics of these products.The enterprise can find the important characteristics of different types of users,and then provide different preferential policies according to different users,as well as improve the design of laptop and the core competitiveness to pursue higher profits.Therefore,how to carry on the information mining to the comment text effectively is very important to the enterprise.Because the comment text has no known label and is short in length,the traditional text mining method,namely,a single clustering analysis method is adopted for the short text,and the clustering results are often unsatisfactory.In the classification problem of data mining,ensemble learning can improve the performance of classifiers by combining multiple single classifiers.Therefore,this paper will take the review text of a laptop from lenovo as an example,and apply the idea of integrated learning to cluster analysis to enhance the clustering effect of short text.At first,this paper uses the Python software from Tmall crawl on Lenovo official flagship store a laptop user evaluation of text,a total number of 3840,and then to delete the comment text invalid values,word segmentation,stop words filtering and text numerical value,according to a series of pretreatment process.Given the high dimension of short text brings dimension disaster problem,in this paper,the follow-up data for the feature extraction,and the comparative method to select feature dimension reduction algorithm is adopted to the comment text feature dimension reduction processing.Then,based on the idea of ensemble learning,the three clustering algorithms,k-means clustering,synthetic clustering and BIRCH,are integrated to build the final clustering analysis model.Finally,the 1,765 lenovo laptop users with valid reviews are divided into two categories.The 0th category of users pay more attention to the appearance parameters of notebook,and they are all appearance users.While the first type of users pay more attention to the performance and configuration of laptop,for performance users;These two types of users pay more attention to the quality of goods service.Then,the author visualize the comment text and the proportion of users of the two types of users,and mine and analyze more characteristics of the two types of users.At last,Combined with text clustering results,some suggestions and strategies are put forward from the two aspects of product marketing strategy and product renewal design.
Keywords/Search Tags:Feature selection, Dimension reduction, Text clustering, Ensemble learning
PDF Full Text Request
Related items