The technology of gene chips promotes the rapid development of bioinformatics. Thousands of gene expression data can be produced from one experiment of gene chips, which contains the rich information that can explain the phenomenon of life. The study of gene expression data has become an important and basic problem of the modern life science. As a popular method in the field of data mining and pattern recognition, clustering analysis is also widely used in gene expression data.Gene expression data usually appears in a matrix form which is high dimension low sample size. The traditional clustering methods deal with one dimension, the rows(genes) or the columns (samples) of the matrix at a time, this way only finds the global information, while the characteristic of the high dimensional data is containing a lot of local information which needs to be clustered from the both dimensions of the matrix simultaneously. For the reason above, the biclustering was arised.The contributions of this thesis are as follows:(1) Several kinds of clustering algotithms were applied with a variety of real gene expression data sets. The experimental results show that there is no clustering algorithm that is suitable for all of the data. Therefore,according to different data, different clustering methods which is relatively more appropriate should be chosen.(2) With the study of four kinds of biclustering algorithms, a combination of traditional clustering method and sparse singular value decomposition (SSVD) was put forward as an improved biclustering algorithm. The improved method can get better result than other biclustering methods.(3) Five indices are introduced to evaluate the biclustering algorithms. The indices are the stability of the algorithm, the significance of gene cluster, correctness of sample cluster, the fitness to a specific model and cohesion of bicluster solution. Among the four biclustering algorithms, the improved SSVD does the best on the indices of significance, correctness and cohesion, which also has a good performance on stability and fitness. |