Font Size: a A A

Research On Store Portrait Based On The Behavior Of User Choose And Online Reviews

Posted on:2017-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:H Z XieFull Text:PDF
GTID:2308330503985511Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of O2 O e-commerce based on the Internet, the relevant data that the O2 O companies can collect growth spurt. To efficiently such data mining,this paper focuses on the online reviews text emotion classification and topic clustering,and on this basis with the user choose behavior data make a complete store portrait of construction and application ideas. The main work of this paper includes:Design and develop crawler system to crawl the experimental data. In order to obtain the experimental data, according to open source crawler scrapy logical system design and embedded ”Ghost Driver” which is based on Phantom JS in the browser test framework Selenuim. Designed and developed a web crawler system which can crawl dynamic Web information: nlp-dynamic-spider. And based on this system crawl the first tier cities of public comment clothing industry shop、user comments and user data as the experimental data of this paper.By adding new words and professional words in the lexicon to improve the accuracy of word segmentation. Due to the reviews text contains a large number of network popular words, professional words, resulting in word segmentation is not accurate. According to the candidate word solid degree, boundary degrees of freedom, numbers, document frequency to introduce large-scale corpus network new words identification algorithm, effectively identify the network popular words, and add clothing industries word on this basis for further improve segmentation accuracy.Three kinds of text representation algorithm based on shallow layer depth learning algorithm Word2 Vec. The traditional BOOL, TF, TF-IDF text representation method combined with shallow layer depth of word vector Word2 Vec algorithm. By means of linear weighted sum of introducing BOOL-W2 V, TF-W2 V, TF-IDF-W2 V three text DR algorithm in this paper. And in four different reviews of these six kinds of data sets representation of emotion tends to classify comparative experiment. Experimental results show that: if in the corpus has lots of documents, but only a small sample marked and label is not balanced, the TF-W2 V text representation show a better effect than other five representations.Construction of text mining component based on Spark big data platform. In order to improve the massive online review text information processing speed, the distributed parallel processing technology is the current trend. In this paper based on the most popular large data processing platform spark interface, design a series of Chinese processing algorithms, including: to find out new words, all kinds of Chinese text representation, text feature extraction, and finally construct a mining components: nlp-spark.Shop portrait based on the behavior of user choose and online reviews. In this paper, we use the user choose behavior data and based on the data mining online reviews behavior data to build the entire store portrait of the indicator system. And give advice about the ideas and practical application of modeling stores portrait indicators.
Keywords/Search Tags:portrait stores, web crawler, sentiment classification, topic clustering, Spark
PDF Full Text Request
Related items