Research On The Optimization Of Task Assignment In Crowd Computing And Association Analysis For Big Data

Posted on:2018-03-23

Degree:Master

Type:Thesis

Country:China

Candidate:Q Wang

Full Text:PDF

GTID:2348330515475555

Subject:Computer technology

Abstract/Summary:

With the arrival of large data age,data scale has surged.Although big data brings a wealth of information and knowledge,but also it has brought serious challenges to traditional data processing technology as the complexity of large-scale data,the growth of,and value the diversity and low density.Therefore,it is necessary to find efficient data processing technologies for big data application environment.Big data processing techniques can be divided into: crowd computing technology based on human-computer cooperation and computer-based data processing algorithm technology.This paper has carried out research work in these two aspects,has made the following two aspects of research results:(One)In the aspect of human-computer cooperation crowd computing technology,aiming at the dependence of big data tasks on complex cognitive inference is mainly solved by optimizing the crowd computing method.Thus,an optimization algorithm for accurately theme-aware task assignment in crowd computing on big data was proposed.In order to improve the accuracy of the calculation.Firstly,themes were extracted by method which combined with theme model and fuzzy k-means adaptation,then both correlation was computed through task model and user model.Secondly,new users’ real theme and initial accuracy were tested by historical tasks with high quality answer.Lastly,the probability that a user can participate in a certain kind of task was calculated and a sequence of candidate sequences was predicted by Logistic Regression(LR),and then the appropriate workers were assigned accurately to the tasks.The simulation results show that the proposed algorithm has a higher accuracy with more cost effective and better performance in big data environment.1.(Two)In the aspect of computer-based data processing algorithm technology,aiming at the requirement of data analysis efficiency in big data,an improved algorithm based on cloud computing is proposed.The traditional algorithm has been unable to meet the demand on the efficiency of data processing for big data anymore.At present,the association analysis algorithm is one of the hot research in data processing technology.In this paper,the improvement of Apriori includes two parts: First,an improved Apriori algorithm(M_Apriori)based on matrix is proposed.The innovation of M_Apriori is the structure of the matrix and the change of the calculation steps.It uses matrix-based data structure to store and process.It only needs to scan the database once to reduce the database I / O overhead,and it structures the frequency support-matrix and uses the logical "AND" operation to improve the algorithm’s core steps(self-connection and pruning).Finally,the theoretical verification and analysis were carried out.Then,a method of parallelization of M_Apriori algorithm(SPM_Apriori)based on Spark is proposed.This method is transplanted into Spark platform for parallel processing.It uses the data parallel and local instead of the global strategy,and it takes full advantages of Spark’s memory-based and RDD,Then the M_Aprior algorithm is designed in parallel and implements on Spark.Finally,experiment results show that the algorithm has achieved good improvement effect.It has enrich the Spark MLlib.

Keywords/Search Tags:

crowd computing, theme match, human-computer cooperation, Apriori, parallelization, Spark

Related items

1	Research On Optimization Of Association Rule Apriori Algorithm And Its Parallelization Based On Spark
2	The Design And Implementation Of Parallelization Of Canopy And FCM Clustering Algorithms On Spark Platform
3	The Parallelization And Optimization Of K-means Algorithm Based On Spark
4	Research And Application Of Parallelization Of Association Rule Mining Algorithm
5	Association Rule Algorithm Optimization And Parallelization Research Based On Spark
6	Research And Application Of Association Rule Algorithm Based On Spark Platform
7	Study On Intelligent Layout Design Of Human Computer Interaction Based On Distributed Knowledge Environment
8	The Research And Implementation Of Mining Large Data Based On Spark
9	Research And Implementation Of Classification Algorithm Parallelization Based On Spark
10	Research On Cluster Analysis Of Biomedical Patent Data In Yunnan Province Based On Spark Cloud Computing Architecture