Font Size: a A A

Large-Scale Gene Expression Data Management And Datamining

Posted on:2005-10-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:C WangFull Text:PDF
GTID:1100360185956832Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
The microarray technology is one of the most exciting scientific advancementssince 1990s, which makes it possible to simultaneously study the expression ofthousands of genes in a single experiment. Management and analysis of the hugeamounts of data produced by microarray experiments is becoming one of the majorbottlenecks in the application of this high-throughput technology. Many softwaretools have been developed to support microarray data analysis by means of clusteringand statistic analyses. However, there are limitations of these tools. First, it isinconvenient and inefficient. Analysts have to switch between many software systemsthat require different data formats. Data exchange also raises concern about datasecurity. Secondly, it is still hard for researchers to infer functions of the genes andtheir relationships from the results in the context of existing knowledge. Because ofits promises for both scientific research and clinical applications, many public orproprietery microarray gene expression information systems (e.g. SMD, ArrayExpress)have been established by western universities, research organizations and softwarecorporations. Therefore, developing microarry data management and data analysissystem with self-owned property right is urgent and crucial for China to keep up therest of the world in this promising field.In this dissertion, We describe a home-built microarray gene expression datamanagement and data mining platform – ArrayLims. The ArrayLims infrastructureconsists of the data management system, ArrayStore and the data analysis system,ArrayMiner. ArrayStore can monitor and manage all kinds of data which generatedby microarray experiment. ArrayStore supports online data submissions and query.Selected data can be exported to ArrayMiner for further analysis. ArrayMinersupports clustering, Gene Ontology annotation and pathway analysis. The result isreturned to the user via a web interface equipped with visualization capability.Comparing with existing systems, ArrayLims has following features:1. Compliant to the international community standard MIAME (The MinimalInformation About a Microarray Experiment), it is possible for dataexchange with other labs and public repositories.2. ArrayStore is a general database designed to hold data from all microarrayplatforms. With its flexibility, ArrayStore can used as public or laboratorygene expression data repository.3. ArrayMiner not only empowers the researchers in simple data analysis suchas clustering, but also enhances the their ability to infer biologicalknowledge by means of gene function annotation and metabolic pathwayanalysis.4. Based on the J2EE architecture and the MVC and DAO design pattern,ArrayLims is an open, extensible, and scalable system.In conclusion, ArrayLims has strength in both data conformity and integrationof analysis tools. It keeps balance of comforming to community data standards andmeeting local lab requirements. It supports the organization, management, analysisand data mining of microarray expression data according to different researcherrequirements. The data analysis system combines general gene expression profileanalysis and biochemical pathway analysis and provides direct, thorough and intuitivedecision-making for researchers. Especially, the explorer of relationship betweengene functions and gene expression profiles provides an important guide to genefunction annotation of microarray gene expression experiment.The system still falls short in the capability of data query, volume data loading,and linking out to public databases. Further work needed to make ArrayLims play animportant role in the National Data Sharing Platform. It is also desirable to extendArrayLims to support microarray data analysis in the context of gene regulationnetworks.
Keywords/Search Tags:microarray, gene expression, database, data mining
PDF Full Text Request
Related items