Font Size: a A A

Research On Quasispecies Reconstruction Algorithm Based On Color Coding Technology

Posted on:2019-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:D HuangFull Text:PDF
GTID:2370330566976189Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The study of virus quasispecies haplotypes has important practical significance in understanding virus gene sequence,developing virus vaccine and developing effective antiviral therapy.Due to the limitation of current technical level,the cost of obtaining viral quasispecies directly through biological means is too high.Therefore,the use of computational methods to reconstruct the virus quasispecies haplotypes has become a hot issue.In this paper,the reconstruction problem of virus quasispecies haplotypes is studied,and a system software package is designed and developed.The work is as follows:Aiming at the sequencing reads of viral quasispecies haplotype and construct conflict graph of reads,this paper introduces a fuzzy distance measuring the difference between reads for measuring the difference between the reads and study the haplotype reconstruction problem,and proposes an improved reconstruction algorithm based on edge weight reduction(IDsatur).Firstly,the CWSS algorithm preprocesses the read conflict graph in accordance with a given threshold.Secondly,all vertices of the graph are colored according to their degree and saturation value,the coloring follows the principle of different colors between adjacent vertices until all the vertices in the graph are colored.Finally,quasispecies haplotypes are obtained by assembling the reads with the same color.Simulated sequencing reads were adopted to compare the reconstruction performance and quality of the IDsatur algorithm and the Dsatur one.The experimental results show that algorithm IDsatur can obtain less quasispecies and higher reconstruction performance than algorithm Dsatur,and effectively deal with and solve the problem of excessive numble of reconstruction when the error rate of sequencing is increased.Aiming at the reconstruction problem of quasispecies haplotypes,a color coding algorithm based on the sum of edge weight and coloring(CWSS)is proposed.The CWSS algorithm adopts the algorithm flow similar to the IDsatur algorithm.In the conflict graph coloring,the CWSS algorithm use the sum of edge weight and saturation to select the coloring vertice.Simulated sequencing reads were adopted to compare the reconstruction performance and quality of the CWSS algorithm and the Dsatur one.The experimental results show that algorithm CWSS can obtain more accurate quasispecies and higher reconstruction performance than algorithm Dsatur.Even in the case of high sequencing error rate,it still maintains good reconstructionperformance.Based on the proposed reconstruction algorithm,IDsatur and CWSS,a practical software package for solving the problem of quasispecies reconstruction is designed and developed.The software package uses the Python language and the JetBrains PyCharm Community Edition2016.2.3(64)with the wxPython library as the development tool and can run in the system that has installed the development tool.The functions of the system package include five modules:parameter setting,reading biological data,rebuilding quasispecies,viewing results and helping.According to the actual situation,the parameter setting module set up the length of quasispecies and the experimental parameters of threshold.the module of reading the biometric data is to read the required data for reconstruction,The software displays the progress of the operation during the reconstruction process,after the completed of reconstruction,the records the number of reconstructions,the reconstruction of the included read set and the reconstructed gene sequence,and the values derived from the reconstruction indicators can be obtained.In summary,this paper introduces the fuzzy distance,and uses threshold to preprocess the weighted conflict graph of reads,and proposes two reconstruction algorithms IDsatur and CWSS.The experimental results show that the two algorithms can obtain fewer quasispecies haplotypes,and its reconstruction accuracy is also higher.Both of which are calculation methods for effectively dealing with the reconstruction problems of virus quasispecies haplotypes.Therefore,the development of this software package also has important practical application value.
Keywords/Search Tags:quasispecies, haplotype reconstruction, weighted graph, color coding, fuzzy distance
PDF Full Text Request
Related items