With the steady development of the business markets, customer resource is becomingthe most valuable enterprise assets. It is helpful for enterprises to adjust and developappropriate marketing strategies that Using customer segmentation techniques to analyzecustomer preferences. Recently the rapid development of e-commerce, online shopping hasbecome increasingly popular. When online shopping, customers are often based on thedescription of the business and relevant customer comments to judge the quality andcharacteristics of the goods. Customer comments are feedback which comes from plenty ofbuyers and more realistic reflection of customer preferences for goods and services.Customer reviews are mostly short text and easy to access. In this paper, a short text data istaken as research object, text data dimensionality reduction methods and customersegmentation problems based on text clustering are emphatically focused on. In this paper,the research mainly consists of the following two points:(1) An improved method of text dimensionality reduction based on information gain isproposed for customer reviews dataset screening. The traditional method of informationgain is considered the global importance of feature words. In this paper, the idea of TFIDFis introduced. The traditional information gain method which is not considered localimportance of feature words is improved. Further, in order to evaluating availability of theoriginal customer comments datasets, this improved method achieves efficient screening ofthe original customer comments datasets through text clustering.(2)A PCA text clustering algorithm which is based on the semantic is proposed, whichachieve customer segmentation text clustering. The traditional dimensionality reductionmethod of PCA doesn’t definitely use the potential semantic relation of feature words. Withthe help of Synonyms Cilin, PCA is applied again based on the combination of synonymsand specific words. The new feature space after dimensionality reduction better representnative space as well as better description of the relation between feature words in nativespace. Thereby, cluster analysis technology is able to analyze concerned attribute ofcustomers to find the distribution of the population and different preferences of differentcustomer behavior. |