Font Size: a A A

Research Of Solutions For The Customer Segmentation Based On The Text Clustering Algorithm

Posted on:2015-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q HuFull Text:PDF
GTID:2298330422488494Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the steady development of the business markets, customer resource is becomingthe most valuable enterprise assets. It is helpful for enterprises to adjust and developappropriate marketing strategies that Using customer segmentation techniques to analyzecustomer preferences. Recently the rapid development of e-commerce, online shopping hasbecome increasingly popular. When online shopping, customers are often based on thedescription of the business and relevant customer comments to judge the quality andcharacteristics of the goods. Customer comments are feedback which comes from plenty ofbuyers and more realistic reflection of customer preferences for goods and services.Customer reviews are mostly short text and easy to access. In this paper, a short text data istaken as research object, text data dimensionality reduction methods and customersegmentation problems based on text clustering are emphatically focused on. In this paper,the research mainly consists of the following two points:(1) An improved method of text dimensionality reduction based on information gain isproposed for customer reviews dataset screening. The traditional method of informationgain is considered the global importance of feature words. In this paper, the idea of TFIDFis introduced. The traditional information gain method which is not considered localimportance of feature words is improved. Further, in order to evaluating availability of theoriginal customer comments datasets, this improved method achieves efficient screening ofthe original customer comments datasets through text clustering.(2)A PCA text clustering algorithm which is based on the semantic is proposed, whichachieve customer segmentation text clustering. The traditional dimensionality reductionmethod of PCA doesn’t definitely use the potential semantic relation of feature words. Withthe help of Synonyms Cilin, PCA is applied again based on the combination of synonymsand specific words. The new feature space after dimensionality reduction better representnative space as well as better description of the relation between feature words in nativespace. Thereby, cluster analysis technology is able to analyze concerned attribute ofcustomers to find the distribution of the population and different preferences of differentcustomer behavior.
Keywords/Search Tags:customer segmentation, text clustering, feature reduction, information gain, principal component analysis
PDF Full Text Request
Related items