Research And Implementation Of RCNA Identification Based On K-means Clustering

Posted on:2016-09-08

Degree:Master

Type:Thesis

Country:China

Candidate:X J Zhao

Full Text:PDF

GTID:2348330488474523

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Gene copy number refers to the number of a particular gene or DNA sequence of a certain region in an organism's genome. Gene copy number variation means and compared to the reference genome, DNA deletions within 1Kb to 1Mb or add a range of structural variation phenomenon. Gene copy number mutations(Copy Number Aberration, CNA) are ubiquitous in the genome of a structural variation, including the number of copies of the gene deletion, insertion, inversion, rearrangement and gene complex than point mutation. As for the study of gene copy number mutations CNA, we can have a whole new view of the structure of the genome, the genetic difference between human and pathogenic genetic factors will be. RCNA(Recurrent CNA) is included in a plurality of samples within the same region of chromosome period of continuous CNA, it exists, and many diseases are associated. For RCNA identification can provide important insights and solutions for the study of the molecular mechanisms of disease genes.This article aims to dig out RCNA region associated with the disease from high-throughput biological data in the calculation and evaluation of RCNA excavated area, provide the foundation and basis for the study of pathogenic organisms RCNA region.Through analysis the RCNA region, we can learn the clustering properties of genes RCNA region. According to this feature, we propose RCNA recognition algorithm based on k-means clustering. During clustering analysis, the RCNA region as a class, the remaining data as another class. Because of raw data has noise, in order to effectively identify RCNA region, first we use Wiener Filtering algorithm for removing noise contained in the data, then analysis this data. For the analysis of the data, we start from the first column to select the data, and then the selected data in k-means clustering analysis. Then the window width starting position to move forward a list, select the specified window width data analyzed again. In order to make the results more accurate data for each of the selected area require multiple k-means clustering analysis, and finally obtain the minimum distance of each sample point to the center of many such clusters clustering results. Through the center of the minimum distance clustering for analysis, can effectively identify the data that exist RCNA area.In this article, all experiments were performed on simulated data sets, and by experimental verification of the feasibility of the algorithm. The experimental results with other existing two RCNA recognition algorithm comparison and analysis show that the algorithm during the recognition process RCNA has a better performance.

Keywords/Search Tags:

copy number aberration, recurrent copy number aberration, k-means clustering

PDF Full Text Request

Related items

1	Wave-front Aberration Analysis Of Three-lens Configuration Slit Spatial Filters
2	Design And Implementation Of Copy Number Preprocessing System Based On PCF
3	Content-Based Video Copy Detection And Trademark Number Recognition
4	Research On Automatic Low-order Aberration Correction Of Slab Laser
5	The Study Of Brain Mri Segmentation Using Fuzzy Clustering Technology
6	Study On The Optical Design And Aberration Compensation Of Immersion Lithography Lens
7	Properties Study Of The Detection And Compensation Of The Low Order Aberration In The Unstable Resonator
8	Copy-move Forgery Detection Based On Block Matching And Clustering Algorithm
9	Optical Design And Aberration Control For Immersion ArF Lithographic Lens
10	Research On Detection Techniques Of Copy-Move Forgery Image