Font Size: a A A

Knowledge Discovery Methods Research Based On Formal Concept Analysis

Posted on:2006-11-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:H QiFull Text:PDF
GTID:1118360155953723Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Formal Concept Analysis (FCA), which was elaborated by Professor Wille of German in the eighties of the twentieth century, reflects the comprehension for concept in philosophy. The Concept Lattice, which is the core data structure of FCA and is also called Galois Lattice, can describe the hierarchy relationship between concepts and has become an important method for the representation of knowledge. With the development of FCA, it has been used widely in the machine learning, data mining and knowledge discovery, information retrieval, software engineering and become one of the most efficient tools to deal with large scale of data. Under this context this paper makes the researches on the bellowing aspect: 1. Concept generation based on search space partition The main difficulty with concept lattice-based system comes from the lattice construction itself. The useful exhaustiveness of concept lattice, when representing the concepts hierarchy, gives an exponential increase of the lattice size according to the data. So the research on efficient algorithm for constructing of concept lattice is urgent. In the numerous algorithms for constructing concept lattice there is a family algorithms based on the computing of closure. NextClosure presented by Ganter is the famous one of them. Research shows that for the data and dense formal contexts the algorithms based on the computing of closure show a good performance and in other situation their behavior is not good. On the research of NextClosure algorithm and the factors that affect the algorithm we propose a new algorithm called SSPCG based on the closures search space partition. The algorithm divides the closures search space into several subspaces in accordance with criterions prescribed ahead and introduces an efficient scheme to recognize the valid ones, which bounds searching just in these valid subspaces. An intermediate structure is employed to judge the validity of a subspace and compute closures more efficiently. Since the partition of the search space is recursive and the searching in subspaces is independent a parallel version can be directly reached. The algorithm is experimental evaluated and compared with the NextClosure algorithm for random generated data as well as for real application data. The results show that our algorithm performs much better than the later. 2. User association mining based on concept lattice The problem of mining user association from rating table plays an essential role in rule-based recommender system. Traditional Apriori-based algorithms often generate a very large number of rules, most of which are not relevant to current recommendation. Recent research show that concept lattice is an efficient tool for association rules mining since the set of all closed frequent itemsets can be orders of magnitude smaller than the set of all frequent itemsets and any information don't be lost. Using the closure of the Galois connection we define two user association bases, the exact base (i.e. for all rules with a 100% confidence) and the approximate base (i.e. with confidence < 100%), from which all-valid user association rules with support and confidence can be deduced. These user association bases are characterized using frequent closed itemsets and their reductions within concept lattice. Algorithm for extracting these two bases is presented and experimental evaluated on MovieLens dataset of GroupLens Research Progject. The results show that the proposed user association bases can considerably reduce number of rules in user association and do not loss any information. 3. Research on classifier based on concept lattice Supervised classification is an essential task of machine learning and data mining. Supervised classification aims to find the rules that predict the class label of an object. Many classification algorithms have been developed and widely applied in practice. Among them there is a family of algorithms that are FCA-based. Recent research shows that there are still many approaches to explore in order to build efficientFCA-based classifiers. Under this context we make the research on the classifier based on FCA. We induce an incremental algorithm that is based on the construction of partial lattice to extract classification rules. This algorithm only creates a partial lattice that contains the nodes of current label and their ancestors. Then the classification rules are extracted on this partial lattice. We also proposed a heuristic algorithm to build an accurate classifier that can ensure to build the classifier using the highest precedence among the rules. Experiments taken on five datasets of UCI machine learning archive show that the minimum confidence has a significant impact on the performance, the minimum support has little impact on the performance, and our classifier is, in general, more accurate than that produced by the state-of-the-art classification system C4.5 and other FCA-based algorithms. 4. GDT and extended concept lattice model Most of data that is used by current data mining system is uncertain. The uncertainty is including inconsistencies and incompleteness. Extracting rules from these data is very hard. A Generalization Distribution Table (GDT) is a table in which the probabilistic relationships between concepts and instances are represented. By using a GDT as a hypothesis search space this paper proposes an algorithm to discovery rules from information system with inconsistencies and incompleteness, which can filter noisy data. The property of GDT make it can predict unseen instances. After analysis the relationship between GDT and concept lattice we introduce an extended concept lattice model combining GDT and concept lattice to deal with inconsistent and incomplete data. Algorithm used to mine rules according to this extended model is given also. 5. Prototype of data mining system based on concept lattice Europe committee developed a CRISP-DM model in order to make the data mining process standard and can be used in business. On the reference of the CRISP-DM we develop a prototype of data mining system based on FCA. This prototype include the construction of concept lattice and extended concept lattice, mining for association rules and classification rules based on FCA, recommender...
Keywords/Search Tags:Knowledge Discovery, Formal Concept Analysis, Concept Lattice, Search Space, Closure System, Parallel Algorithm, Association Rule, User Association Base, Recommender System, Classification Rule, Classifier, Generalization Distribution Table (GDT)
PDF Full Text Request
Related items