Font Size: a A A

High-throughput Genome Sequencing Image Processing And Data Analysis

Posted on:2011-04-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:B G YeFull Text:PDF
GTID:1118360308463893Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
In China, the high-throughput genome sequencing research is just now starting, which is allimportant and exigent in these days. At present, the foreign corporations are utilizing their superiority in the sequencing technology and equipment, putting in for the genome patents preemptively in case of the uniqueness of the genome, bucking for forestalling the global genome industry in the future.'Sharpen the knife before cutting the wood', and no modern genome sequencing technique, no modern biology technique. The tomorrow biology medicine, biology energy source, individualized medical treatment, etc. will be built on the base of the morden genome sequencing, especially the individualized medical treatment with the character of genome diagnosis and therapy.In the high-throughput genome sequencing technology, the original image consists of fluorescent spots with base information, and we can get the base of the genome by image processing and data analysis. The paper mainly contains two parts, high-throughput genome sequencing image processing and its relative data analysis. The image processing is mainly denoising and sharpening of the sequencing images, and segmenting fluorescent spots, establishing the fluorescence intensity data file with base information and its noise data file. The data analysis is mainly decoupling the signal of the fluorescence intensity, phasing emendation, base calling and its quality evaluating. The following is the main content of the study.1) Adopting wavelet method and putting forward one kind of image denoising arithmetic based on the threshold of the relative wavelet coefficient. The image denoising arithmetic is based on the signal coefficient has strong relativity and the noise has weak or no relativity, constructing the relative function of wavelet coefficient, getting the relative threshold to carry out the image denoising.2) Based on the research of the image entropy and level set segmentation, putting forward one king of C-V model segmentation with image entropy. The segmentation arithmetic is based on the research of C-V model of level set segmentation method, introducing image entropy arithmetic, and the research of the entropy is building up the statistical character of the segmenting region, providing the direction of searching the target, improving the anti-jamming ability and adaptivity of the C-V model, making the segment result more accurate and the efficiency higher.3) Putting forward one kind of the base fluorescent signal decoupling arithmetic based on the correlation analysis. The decoupling arithmetic is based on the fluorescence intensity data, using the correlation analysis method, constructing the cross-talk matrix, and the method to construct the cross-talk matrix is not only from the analysis of one dimension tme serial sigal but also from space sigal, and the matrix also needs further emendation, the factor of the matrix from one sample kolmogorov-smirnov test.4) Putting forward one kind of base phasing emendation arithmetic based on regression analysis and markov process. In high-throughput genome sequencing synthesizing reaction, when one base fragment to sequence has phasing overtaking or delaying, its fluorescence intensity will be highest, namely it has the highest intensity in the same cycle. According to this fact, the emendation arithmetic is mainly to use the regression analysis and markov process, and to seek for the probability matrix of phasing emendation.5) Putting forward one kind of base calling arithmetic based on maximum a posteriori. Base calling is to tell the base with the highest reliability from dealed fluorescence intensity signal, and to form the genome fragment by the synthesizing order. The base calling arithmetic of the paper is based on maximum a posteriori, which is mainly a quadrature process on a three dimensions Gaussian probability hypersphere incase of one dimension reducing.6) Based on the research of the noise, putting forward one kind of base quality evaluating method. The base quality evaluating method is to judge the result of the base calling, and based on the research of the noise, the paper takes the use of Monto Carlo method to get the probability of the base with low signal noise ratio, and gives the definition of the base quality.
Keywords/Search Tags:high-throughput genome sequencing, wavelet analysis, level set, signal decoupling, phasing emendation, base calling
PDF Full Text Request
Related items