Font Size: a A A

Scrapy-based Crawling And Characteristics Analysis Of An E-commerce Network

Posted on:2013-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2248330371459451Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The widespread use of Internet provides a good environment for e-commerce. The study of e-commerce network characteristics always focuses on the Taobao. Researches based on Taobao are related to credit rating system, marketing strategy, analysis of characteristics of the seller and so on. The purpose of all these study is to analyze online marketing transactions in e-commerce. We will study e-commerce network from the perspective of graph theory in this paper. Firstly, we crawl Taobao share-platform using Scrapy crawl architecture. Then, based on obtained dataset we study topological characteristics of Taobao share-platform and user behavior on it. Our contributions lie in three aspects as following:Firstly, we study the sampling methods for bipartite graphs. Our work includes analyzing the effectiveness of extending MHRW algorithm to bipartite graphs and making a modification in sampling procedure to improve the stability. Finally, we compare our MHRW sampling algorithm with Random Walk (RW) over the generated bipartite graphs as well as real two-mode network graphs. Simulations show that MHRW outperforms RW over bipartite graphs.Secondly, crawl Taobao share-platform using Scrapy crawl architecture. After analyzing format of web pages in Taobao deeply and combined with the BFS and MHRW two kinds of sampling methods, we ran crawler on five PCs for30days. Besides, we list some big problems encountered in the crawling process, then give the final solution. In addition, we crawled one type of sellers’ data in order to analyze relationships between sellers and buyers.Finally, analyze characteristics of users’behavior in Taobao share-platform based on obtained dataset. We intend to find the relationships between sellers and buyers connected by items in share-platform. Surprisingly, we find that share-platform is a tool for some buyers to advertise items for sellers who have high credit score, and other buyers only to help them to support the platform.
Keywords/Search Tags:e-commerce, Taobao, bipartite graph, sampling method, MHRW, Scrapy, user behavior
PDF Full Text Request
Related items