Font Size: a A A

Research On The Key Issues Of Efficient Processing Of Large-scale Tasks In E-commerce

Posted on:2021-03-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Y ChengFull Text:PDF
GTID:1368330611471812Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,mobile Internet,big data and other technologies,as well as the gradual implementation of China's "Internet +" strategy,E-commerce have become an important part of people's life,also become the important field of application of new technology innovation.Studying the key technologies of E-commerce has important historical significance,practical significance and social significance.E-commerce websites have massive users,massive data and complex application scenarios.Technological innovations around business characteristics are constantly emerging.Cloud computing,big data and other technologies are the first to be widely used,improved and innovative in the field of E-commerce.At the current stage of E-commerce technology development,how to deal with the challenges brought by massive users,massive data,and complex application scenarios,and how to efficiently process large-scale data in system deployment,business processing,and data mining analysis are still important issues in E-commerce research..This thesis focuses on the key issues of efficient processing of large-scale transaction data in e-commerce.Firstly,two data dimensionality reduction methods in pre-processing are proposed to solve the high-dimensional data problem in large-scale data processing.Secondly,a two-stage task deployment method based on reinforcement learning is proposed to solve the hot issues such as task deployment and scheduling in large-scale data processing.Finally,a structured data distribution method based on data correlation is proposed to solve the problem of multi-data center data transmission caused by complex query application of e-commerce.The main contributions of this thesis are as follows:(1)Summarize the efficient processing methods of large-scale transaction data of E-commerce,and elaborate the main background,important significance and research focus of the current research work.This thesis introduces the significance,development trend and key technologies of e-commerce to national life,analyzes the main framework and process of large-scale data processing under the circumstance of big data,as well as some important problems in the whole process of large-scale data processing,and focuses on the detailed introduction and analysis of the solutions to the key problems of data processing in the field of e-commerce technology..(2)Research on the preprocessing of high-dimensional data of E-commerce.E-commerce data processing often faces a large amount of high-dimensional,low-density data.Traditional classification methods are often affected by the characteristics of the data and it is difficult to mine and analyze internal relationships.In response to this problem,two pre-processing mechanisms for high-dimensional data are proposed.For unlabeled data,for the inaccuracy caused by the clustering of high-dimensional data by traditional methods,a combination of principal component analysis and clustering for data reduction is proposed.And the classification method,this method establishes the main factor and the correlation factor model,uses the correlation factor coefficient to construct the website similarity distance,and improves the rationality and interpretability of the website evaluation by improving the DBSCAN clustering algorithm.For dimension reduction of labeled data,the traditional method is not efficient and easy to fall into the local optimization dilemma.In this thesis,a distributed particle swarm method based on rough set is proposed.This method cleverly combines the particle swarm method and rough set theory.Particle swarm synchronously searches for the optimal feature subset,improves the execution efficiency and search range,and adds a random factor to the evaluation function of the feature subset to reduce the uncertainty of the search.Experiments prove that this method effectively improves the feature selection efficiency of large-scale data.(3)Research on the task deployment of large-scale data processing for E-commerce.The limitations of many traditional methods and changes in resource performance in heterogeneous environments have led to problems such as relatively long system response times,high algorithm complexity,and waste of resources.To solve this problem,this thesis proposes a large-scale parallel task processing method TOPE based on reinforcement learning.This method regards the entire network as a multi-agent system,realizes virtual node mapping through distributed multi-target group intelligence,realizes virtual link mapping through deep reinforcement learning and Markov decision process,and finally realizes task allocation in fat tree topology The two-stage optimization work.Experimental results show that TOPE can balance load balancing,bandwidth overhead and energy consumption,and effectively reduce the energy consumption of computing nodes and links.(4)Research on the data distribution of large-scale data processing for E-commerce.In a cloud computing environment,data is distributed in multiple data centers.Massive user random and search-like queries cause frequent data transmission between multiple data centers,and query processing efficiency is difficult to meet application needs.In response to this problem,this thesis proposes a data distribution method based on file correlation.The entire cloud environment is regarded as the Internet,and the user's random and massive query behavior is regarded as Internet search.The index and the correlation between files are established based on the Internet search idea.Correlation with files,and then use the improved clustering algorithm to redistribute the data.First,according to the data characteristics of the query application under cloud computing,the data table is mapped to a data feature vector model based on statistical data,and the correlation feature matrix of the data table is constructed according to the distance of the feature vector,and the matrix element adjacent element value and the key energy Clustering by value,and finally clustering the correlation of the data table through the clustering method of BEA,so as to distribute the data in the cloud environment.Experiments show that by adopting the correlation distribution strategy,the relevant data can be reasonably allocated to the same data block,avoiding the data transmission of the connection query especially during the massive temporary query process,and the data query processing efficiency is significantly improved.
Keywords/Search Tags:E-commerce, Big Data, Large-scale Tasks, Two-phase Optimization, Reinforcement Learning, Feature Selection
PDF Full Text Request
Related items