Font Size: a A A

Research On Substructure Discovery Based On Hybrid Evolutionary Algorithms

Posted on:2009-09-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:X G ChangFull Text:PDF
GTID:1118360272485558Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Relationships are ubiquitous. Recently, structured data mining which learns various relationships between entities has been brought into focus and become a main branch of data mining and machine learning. structured representations can give a more informative and essential view of the problem at hand, but often lead to large and complex hypothesis spaces, and hence provide a new challenge over the domains of data mining and machine learning. In this paper we have used the evolutionary algorithm which is good at solving many complex optimization problems to the substructure discovery, one of the central tasks of structured data mining, and achieved better experimental results. The main contents are as follows.1. An algorithm HEASD is proposed based on the theory of hybrid evolutionary algorithms to discover substructures from graphical databases. In HEASD, we give new graph-based individual representation and genetic operators which are integrated with the hill-climbing. The experimental results show its effectiveness. In addition, a new way of substructure extending, single-label extending, is presented and justified through theoretical proof and experiments.2. Being a bottleneck problem of graphical data mining, subgraph isomorphism is considered as the root of all complexities. It makes the evolutionary searching a one-way search , which is actually an incomplete search. To deal with the incompleteness, we propose HEASDBT which can focus the search on some regions of hypothesis space with backtracking strategy. The experimental results show its effectiveness.3. To overcome the problem that qualities of solutions are often reduced by the phenomenon of losing instances, we propose two algorithms, HEASDFI and HEASDCI. While the former uses a prevention strategy to avoid losing instances, the latter employs a curing strategy that can get back the lost instances. Their effectiveness is illustrated in experiments. At the same time, in HEASDCI, a new genetic operator, called individuals cooperation operator, which enables the searching of different individuals that represent the same substructure in a cooperative way, is proposed to enhance evolutionary searching. Additionally, although the time complexity of graph isomorphism is not of the NPC like subgraph isomorphism, there is not an algorithm of the polynomial time complexity yet. So we present an approximate algorithm which has polynomial time complexity to do graph matching which is frequently used by individual cooperation operator.4. We apply the proposed algorithms to the academic discipline building and the study of regional economy. In the former case, we collect relevant information of the discipline of information and computing science in many colleges or universities in China and represent them in graphical forms. Then the proposed algorithms are used to extract the typical pattern on these data as one of the guidelines for the building of the discipline. In the latter case, we represent the economic development data of 35 big or medium cities of our country and their adjacency relationships as a graph, and then extract some economic development patterns satisfying some pre-specified conditions with our algorithms.
Keywords/Search Tags:graph-based data mining, substructure discovery, evolutionary computation, hybrid evolutionary algorithms, individual cooperation, backtracking search
PDF Full Text Request
Related items