Font Size: a A A

Improvement Of Genetic Clustering Algorithm And Its Application In Gene Expression Data Analysis

Posted on:2020-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:F B SongFull Text:PDF
GTID:2428330575454497Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,the emergence and application of cDNA microarray technology has enabled the gene expression data to be detected in large quantities,thus providing sufficient sample data for humans to analyze diseases at the molecular level.How to mine useful information through massive gene expression data has become a hot research topic.As an important algorithm in data mining,clustering algorithm is an effective means to analyze gene expression data.Researchers often use clustering algorithms to find similar genes,so as to analyze the significant characteristics of samples by using known gene expression data.Among many clustering algorithms,k-means is one of the most commonly used clustering algorithms and is commonly used in the analysis of gene expression data.However,k-means itself has the problems of initial center point sensitivity and local convergence.As a common method to solve multi-objective optimization problems,genetic algorithm can effectively improve the clustering effect of k-means.In this paper,genetic clustering(genetic k-means algorithm,genetic k-means++algorithm)is studied and coded.The genetic clustering algorithm is improved to improve the convergence speed and population diversity of genetic clustering algorithm,and the,improvement is verified by experiments.The effectiveness of the algorithm is further applied to the field of gene expression data analysis.The specific improvements are as follows:(1)Selection of initial population.In the genetic clustering algorithm,an evolutionary individual represents a distribution scheme of clustering center points.The initial population at this time is equivalent to the initial central point set in k-means clustering.There are a large number of unrelated genes in the gene expression data,and such genes account for a large proportion and contain less information.In order to avoid the influence of unrelated genes.This paper adopts the division method of the Most volatile attribute(MVA).The greater fluctuations in the expression data of the genes in different samples,indicating that the information contained in the gene is richer.In this paper,the first M attributes with the largest data fluctuation are divided into k intervals according to the hierarchical clustering method,and then the data is randomly selected from each interval as the initial center point of the cluster.The clustering effect of the genetic clustering algorithm based on MVA partitioning method is better than the traditional initial center point partitioning method proved by experiment.(2)Based on the selection strategy of difference degree,the traditional selection strategy relies entirely on the individual fitness,and it is easy to select two similar individuals as the evolutionary individuals to cross,then the new individuals after the intersection may be similar to the original intersecting individuals.Crossover operation also loses its meaning.In order to avoid this phenomenon to a certain extent,this paper proposes a selection strategy based on Degree of Difference(DD).When a crossover individual has been identified,the difference is used as the weight of fitness to avoid premature convergence.The selection strategy based on the degree of difference can effectively improve the accuracy of the genetic clustering algorithm proved by experiment.(3)In the evolutionary framework of the dual elite population,this paper refers to the idea of hierarchical genetic algorithm,and divides the evolutionary population into two populations,A and B.Both populations are dominated by elite individuals to ensure the global convergence of the population,and the two populations are in their respective evolutions.Do not interfere with each other during operation.Population A uses adaptive cross-variation and conservation evolution to accelerate evolutionary efficiency,and population B expands population diversity by introducing a certain number of random individuals.After population A and population B independently operate certain evolutionary algebras,the two populations are inter-populated,so that the two populations can co-evolve and complement each other to complete the evolution of the population.
Keywords/Search Tags:gene expression data, genetic algorithm, elite strategy, k-means
PDF Full Text Request
Related items