A Data Mining System For Frequent Graph Based On Eclipse Platform

Posted on:2009-11-18

Degree:Master

Type:Thesis

Country:China

Candidate:S Liu

Full Text:PDF

GTID:2178360242980207

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Recently, the research on semi-structured data mining has become the hot issues in data mining field. Internal researches in this field are mostly on text data mining, and the research on graph mining is just at the beginning phase. Graph can express more meanings than the other data structures, so it can be used widely in lots of scientific and commercial fields, such as the structure of the chemical combination, bioinformatics, machine vision, video index, text retrieval and web analysis.Compared with other kinds of graph patterns, frequent graph is a basic pattern which can be found in a graph set. Mining frequent graph is to mining patterns that we are interested in. Frequent graph can be used to depict features of graph set, distinguish different graph groups, classify and cluster graphs, construct graph index and search similarity conveniently in graph database. Frequent graph mining is the foundation and keystone of graph mining.This paper designs and implements an integrated system for the application of frequent graph mining based on the Eclipse platform. This system integrates many kinds of classic algorithms and some useful tools. It can be used as an opening, transplantable, extensible platform for basic algorithm researching in data mining field.First of all, this paper introduces three aspects of knowledge briefly, including data mining, frequent graph mining and Eclipse platform. They are prepared for the following discussion.The second chapter researches the basic theories and common methods used in frequent graph mining. Four classic algorithms in frequent graph mining are mentioned here. They are gSpan algorithm, MoFa algorithm, Gaston algorithm and FFSM algorithm.The third chapter enters into the phase of system design. At the beginning, it introduces Eclipse RCP and EMF (Eclipse Modeling Framework) further more. It tells of the architectures of Eclipse and Eclipse RCP in details, and summarizes three advantages of developing with this framework, that is the advantages for enterprise, for developers and for the end users. Then it completes the design of the whole system framework using the method of model driven development. The discussion in this part focuses on the design of the BIGraphModel.BIGraphModel is the uniform format for input, processing and output of the frequent graph mining system. Many other kinds of input formats can be transformed into BIGraphModel by data parsing plugin, so the model of the end production can be transferred between many different tools.BIGraphModel is a kind of machine readable model, and it supplies users an Object-Oriented way to manipulate data. This design gives BIGraphModel three hierarchies, and it includes four classes altogether- Root, Graph, Node and Edge. All discussions in this part are carried on around the problem how to improve the reusability and extensible for the system.The detailed design and implementation phase comes after the overall design phase. At first this part discusses the design of the basic platform-RCP, and then it finishes detailed design for every plugins in the system except AutoBIPinGen, including system basic platform, runtime plugin, data parsing plugin, all algorithm plugins and visualization plugin. The rich client platform (RCP) is used to design the basic platform of the system. As a platform, RCP holds many excellent features, such as Eclipse preference, update manager, help system. It has an opening structure and opening source code, so it can be maintained and updated by all members in the developing group together. This design uses the mechanism of plugin fully, so designers and users can treat the whole system as lots of plugins. In this way, the system becomes an opening platform, and each of its components is a plugin.The runtime plugin is necessary for the running system. It centralizes all resources and models for the system, and it gets and sets the preference parameter, so it is the kernel of the system. Every plugin except RCP platform has to depend on it.The data parsing plugin parses the input file and generates BIGraphModel. It isolates the algorithm plugins and the input files which could have many different formats. The data parsing plugin only has to depend on the runtime plugin, and it is independency from all the other plugins. That is to say, the other plugins can use the data parsing plugin straightly to parse input file, and the data parsing plugin supplies UI to finish the transform task.Each algorithm is encapsuled into a separate plugin, and it can avoid couplings between algorithms. The new arrivals only need to depend on the data parsing plugin and the runtime plugin.Besides the above plugins, the system also supplies two kinds of visualization functions, one is visualization for results of the algorithm, and the other is visualization for the model files. Visualization is the most direct way to describe the results of the algorithms.One of the excellent designs for the system is the algorithm plugins auto generating framework-AutoBIPluginGen. This framework gives the developers a convenient environment to get plugins for their algorithms. It does not require any knowledge about plugins, and does not require any code for UI. The only thing the users have to do is to customize UI and add algorithm code. So it can save them lots of time spending on plugins and SWT/JFace.AutoBIPluginGen is independency with the other plugins in the system, and it can run in Eclipse environment or any platform based on Eclipse all by itself. AutoBIPluginGen uses templates to auto generate algorithm plugins. The templates make many fussy steps transparent to the users. These steps include implementing the extension points, generating class, customizing parameters and so on. The users only need to care about"what kind of UI I need?". This kind of design brings the users with the real experience of auto generating plugins.The kernel of AutoBIPluginGen contains: templates, customized extension points and abstract class for UI generating. When the user uses templates to generate plugin, the templates will auto implement the customized extension points, auto fill in each item of information, and auto generate class which inherits from the abstract class. Then the user can obtain a UI plugin for his algorithm. There are three steps to use AutoBIPluginGen. First, customize UI; second, integrate algorithm; finally, generate the integrated plugin. The step of customizing UI can be implemented in two ways. One is using templates, and the other is implemented without templates. Two of them can achieve the same goal, and the system commends the way using templates. There is a special design in the step of integrating algorithm. That is to use of the progress bar and to make the user's algorithm running background. When the user's algorithm runs for a long time, this design can avoid no response from the UI, so it improves the users'experiences.At last, the author summarizes the total works and makes a working schedule for the future.

Keywords/Search Tags:

Frequent

PDF Full Text Request

Related items

1	The Techniques Research On Frequent Pattern Mining
2	Research On Mining Frequent Itemsets Algorithm Based On Bittable
3	Research On Mining Algorithms Of Maximal Frequent Item Sets
4	Research On Top-K Frequent Itemsets Datamining Algorithm
5	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application In Simulation System
6	Key Techniques Of Map Database Frequent Pattern Mining
7	Study On Frequent Subtree Mining And Its Application In XML Mining
8	An Algorithm And Context Analysis Of Mining Frequent Closet Itemsets
9	The Research And Relization Of Mining Frequent Patterns On Business Data Straems
10	Research On Algorithm For Mining Frequent Itemsets Of Uncertain Data