Font Size: a A A

Constraint Connection Process Based On Vector Space

Posted on:2018-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:S W ZhangFull Text:PDF
GTID:2348330512487357Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,with the development of social economy and the progress of information technology,Internet of things,Cloud computing and other new services emerge.New services related to space location are increasing,the spatial data size is growing and accumulating at an unprecedented rate.How to handle large-scale spatial location data sets and find the best results that users want is a hot spots in big data research.In the process of data processing,collation and analysis,the traditional method of calculation and processing is inefficient.The algorithm is usually designed on a computing node.It is difficult to deal with millions or billions of large data sets,and the computing power and flexibility are poor.Aiming at the above problem,this paper proposes a Constrained Connection Process based on Vector Space(CCPBV),which uses the similarity to self-connection operation to continuously delete the data in the data analysis process,and achieve the self-connection operation based on the distances,and for the users to find all the reachable path in the results set.Firstly,the algorithm uses the grid partitioning strategy to divide the mesh in the vector space.In order to improve the efficiency of the algorithm,the algorithm is designed on the Map-Reduce framework,the Map phase is based on the divided cells,through the deletion of the node to the cell,the deletion of the constraint area of the node,the deletion of the unidirectional edge set and the distance from node to node are selected to delete all four stages to find all the candidate nodes that satisfy the constraint.The Reduce phase calls calculate the results of the Map,expands the paths that have passed through the subset of alternative nodes,and finally finds the full result path that satisfies the constraint.Secondly,for the CCPBV algorithm,the computational complexity of the system increases exponentially with the dimension in the high-dimensional vector space,some methods of preprocessing improvement are proposed,which are reduced by direct projection,bubble sorting dimension reduction and attribute classification.The high-dimensional vector space is divided into several low-dimensional vector spaces,and the CCPBV algorithm is executed on eachlow-dimensional space.After the algorithm is executed in the low-dimensional vector space,it is integrated into the original high-dimensional space for the result deletion.After satisfying the basic requirements,at the same time improve the response speed,reduce the cost of data replication and calculate the cost.The experimental results show that the CCPBV algorithm can solve the problem of location-based constraint connection based on the constraints proposed in this paper,and reduce the data dimension,computational complexity and system response time.It has high efficiency,integrity and accuracy.It is an effective way to deal with large-scale data sets in vector space.
Keywords/Search Tags:big data, data analysis, self join, constraint join, Map-Reduc
PDF Full Text Request
Related items