Font Size: a A A

E-commerce Big Data Acquisition System Based On Business Plug-in

Posted on:2020-08-16Degree:MasterType:Thesis
Country:ChinaCandidate:T Q LiFull Text:PDF
GTID:2428330596463699Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,E-commerce has also been greatly developed.Online shopping has spread to all parts of the country.Many emerging e-commerce platforms have poured in,hoping to gain a share in the market of this information age,so it has attracted more and more people to join the e-commerce industry.The e-commerce industry is a new economic form,and its development plays an important role in driving the development of the entire social economy.For cities with a large proportion of the industry,it is necessary to find the operating rules behind them,find effective information from the data,and make reasonable adjustments.Aiming at the large amount of E-commerce data required for data analysis,the thesis uses web crawler technology to propose an e-commerce big data acquisition system to solve the problem of data source.The thesis first introduces the key technologies involved in the ecommerce data collection system,including the principle of web crawler,string manipulation technology,URL deduplication technology,anti-reptile technology and database technology.On this basis,the demand analysis and feasibility analysis of the E-commerce data acquisition system were carried out,and the overall framework and functional modules of the system were designed.The main results of the thesis are as follows:(1)For the process of the crawler accessing the web server,the HTTP protocol and the IP management pool class are applied to ensure the validity of the IP,so that the webpage is continuously crawled.In terms of page parsing,a parser that automatically recognizes web page types is designed and implemented.Different data parsing mechanisms are adopted for different data transmission formats,and the extraction of page information is completed by combining regular expressions.Aiming at the problem of low scalability of traditional data acquisition systems,a method of business plug-in is proposed.The crawler business of different e-commerce platforms is compiled into a DLL file to realize plug-in management of dynamic loading,plug and play.(2)A method for store classification is proposed for actual data needs.The keyword segmentation technology is used to extract the product categories in the product title,and then the weight of the commodity transaction amount is combined to realize the correction of the category of the store.At the end of the thesis,the function of the system was tested.The results show that the system can run stably and the data collection efficiency is high,which can meet the expected requirements.
Keywords/Search Tags:e-commerce, web crawler, plug-in, e-commerce data collection
PDF Full Text Request
Related items