Research On The Technology Of E-commerce Product Quality Risk Assessment Based On Data Mining

Posted on:2017-01-01

Degree:Master

Type:Thesis

Country:China

Candidate:M Zou

Full Text:PDF

GTID:2308330482980659

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of network and information technology, The Internet is irresistible to enter people’s life, and that changed the way of people live. However, due to the virtual and across the region of e-commerce, some products purchased by consumer may exist quality risks, In view of this problem, and on the basis of previous studies, risk evaluation model of user’s reviews based on e-commerce platform is proposed, which can accurately identify the risk, evaluate the rank of risk, and then release the corresponding forewarning information so that the regulators make a quick response to risk. In this paper, the main contents are as followings:This paper use data processing technology of R language to process the null value and outliers for data, and use the jar of ansj based on Spark to Chinese text segmentation and remove stop words, thus the data preprocessed of training is obtained. Then, adding noise in the attributes of Out-Of-Bag of the Random Forest, and order the calculation error in order of importance, to make feature selection for product quality risk evaluation of the electronic commerce, and it is realized on the Spark computing framework.In the risk assessment of e-commerce product quality, an improved Na?ve Bayesian algorithm based on Spark parallel algorithm is proposed to establish risk model. Na?ve Bayes algorithm assumes that the features are independent. In practice, however, it is closely bound up among them. So the improved algorithm use the correlation computed by MinHash between feature and label to weighted conditional probability in Na?ve Bayesian, and the parallel of algorithm is implemented in Spark.Experiments are made by the Spark large-scale cluster. The efficiency of the improved Bayesian algorithm based on Spark is better than that of Na?ve Bayesian and its serial algorithm in accuracy, recall and the time complexity on UCI data sets. With the increase of the experimental data, the efficiency of the serial algorithm is lower, but in Spark distributed environment, the efficiency is significantly improved. So the parallel algorithm based on Spark has better scalability and superiority in the large-scale data environment. And experiments show that the model is applied to user comments of e-commerce platform, it can accurately identify e-commerce product quality risk, then and make a risk pre-warning. So a new model of risk supervision is put forward.

Keywords/Search Tags:

Data Mining, Naive Bayesian, Random Forest, MinHash, Spark

PDF Full Text Request

Related items

1	Research On The Approach Of Classification In Data Mining Based On Naive Bayesian
2	Research On Random Forest Classification Algorithm Based On Spark Distributed Platform
3	Research On Parallelization And Optimization Of Random Forest Classification Algorithm Based On Spark
4	Energy Consumption Evolutionary Optimization At GCC Compile Time Based On Bayesian Network And Random Forest
5	Ecological Scientific Investigation Data System With Anti-crawler Mechanism
6	Optimization Of Distributed Random Forest Algorithm Based On Hierarchical Subspace
7	Research On Parallel Text Categorization Of Random Forest
8	The Research And Implementation Of Bayesian Classification Algorithm In The Text Based On Spark Platform
9	The Design And Implementation Of Power Grid Data Mining Platform Subsystem
10	Research On Spam Filtering Technology Based On Bayesian Classification