Font Size: a A A

Study And Implementation Of Distributed Algorithms For Three-way Concept Lattices

Posted on:2021-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ChenFull Text:PDF
GTID:2518306047984079Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Formal concept analysis is an effective tool for knowledge representation and knowledge discovery.It has been successfully applied in many fields,such as knowledge engineering,machine learning,information retrieval,data mining,software engineering and social network analysis.Three-way concept analysis is an extension of formal concept analysis.It combines the idea of "divide and conquer" in three-way decision,and provides more information than formal concept analysis.However,three-way concept analysis theory has been put forward in recent years,there are few effective three-way concept construction algorithms,especially in the case of large data scale.In this thesis,we design and implement the distributed construction algorithm of three-way concepts and three-way concept lattice,so that the three-way concept analysis theory can be well developed and applied in the field of big data.First of all,the thesis analyzes the basic idea of the classical concept lattice construction algorithm and the three-way concept lattice construction algorithm,and proposes the SI3 C algorithm based on the Spark distributed framework.Based on the advantages of In Close algorithm,SI3 C algorithm designs and implements three-way concept distributed building algorithms based on Spark.In order to adapt the algorithm to the Spark distributed cluster framework,SI3 C algorithm first transforms the depth-first search strategy of In Close algorithm into breadth-first.Starting from the second layer,three-way concepts of each layer are generated in an iterative way.The algorithm distributes all three-way concepts generated in the last iteration to each node of Spark cluster,and calculates all sub concepts of the three-way concepts in the layer in a task-parallelism manner until the smallest three-way concepts are generated.At the same time,the algorithm uses Spark's rich high-performance operators to realize the generation and pruning of three-way concepts,and improves the process of concept generation of In Close algorithm to reduce the computational complexity.The experiment results show that SI3 C algorithm can effectively calculate all three-way concepts of a formal context.Secondly,CV3 C,a three-way concept building algorithm based on context decomposition,is proposed.CV3 C algorithm first transforms the construction of the three-way concepts into the construction of the classical concepts through the apposition or subposition of the formal context.Secondly the algorithm decomposes the formal context through the cut points in the bipartite graph to obtain several sub-contexts,and then calls the classical concept batch algorithm on the Spark framework to calculate the formal concepts of all sub-contexts in the way of data-parallelism.All the formal concepts are reconstructed and transformed into three-way concepts.Then the CV3 C algorithm is optimized,and the CV3C+ algorithm is proposed to deal with the sub-contexts of non-uniform decomposition.When the dimension of the sub-contexts exceeds the threshold,the classic concept distributed construction algorithm is called to calculate the corresponding concepts of the sub-contexts in the way of task-parallelism to improve the parallelism of the algorithm.In order to verify the efficiency of the algorithm,this thesis selects multiple groups of formal contexts with cut points for experiments.The experimental results show that the algorithm based on the context decomposition has improved the computational efficiency compared with the algorithm that directly computes the three-way concepts through the subposition of formal contexts.The performance of the optimized CV3C+ algorithm is also better than that of the CV3 C algorithm.Finally,on the Spark Graph X component,we design three-way concept lattice Hasse diagram construction algorithm SI3 CL for the three-way concepts generated by the SI3 C algorithm which records the hierarchical relationship of concepts.The parent-child relationship between the three-way concepts is displayed graphically.In this thesis,the performance of the algorithm SI3 CL is analyzed through experiments on several random data sets,which show that SI3 CL algorithm can effectively calculate the parent-child relationship among the three-way concepts and generate the corresponding Hasse diagram.
Keywords/Search Tags:formal concept analysis, three-way concept analysis, Spark, three-way concept lattices
PDF Full Text Request
Related items