Font Size: a A A

Multi-core Based Parallel Similar Connections

Posted on:2018-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:L J FengFull Text:PDF
GTID:2358330515999247Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Similarity join is an operation to measure the similarity of data by given similarity function,and finding out all the data pairs that their similarity are not less than the given threshold.Similarity join is widely used in application areas,such as,fuzzy keyword matching,document clustering,system recommendation,collaborative filtering,data integration and cleaning,etc.With the development of internet and mobile applications,the amount of data is increasing explosively,to analyze huge amount of date requires great calculation ability,similarity join becomes one of the most popular method in data processing area.There are several methods to measure similarity,such as,Jaccard similarity,Cosine similarity,Overlap similarity,Hamming distance,Edit distance.These mainly adopt the Jaccard similarity to quantity the similarity value of data pairs.The processing capacity of traditional single-core computer has been difficult to meet the calculation requirements of mass data processing.To improve calculative efficiency and performance,utilizing multicore based parallel programming,taking advantage of multi-core system,has been a trend to realize personal low cost calculation and multi-core technology.It brings expectation to the similarity join of mass data.In the experiment,base on proposed data partitioning and task partitioning strategy,this thesis has achieved four different similarity join algorithms to verify the performance and scalability of the proposed multi-core parallel similarity joins method.Four algorithms are as follow:equilibrium data partioning and sharing index,equal length data partioning and sharing index,equilibrium data partioning and detached index,equal length data partioning and detached index.The experimental results demonstrate that the method proposed in this thesis can efficiently utilize the parallel computation capacity of multi-core,and improve the efficiency of similarity join significantly.
Keywords/Search Tags:multi-core, multi-thread, parallel, similarity join
PDF Full Text Request
Related items