Font Size: a A A

An Efficirent Tool To Trim Primers Of Amplicon Sequencing Data

Posted on:2021-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ShaoFull Text:PDF
GTID:2370330602993984Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Cancer is caused by certain mutations in the genes that control our cell function,especially the genes that control cell growth and division.Specific types of mutations detected in cells help diagnose cancer.Mutations can also be used to track patients' response to treatment after diagnosis.Methods of gene mutation detection include whole genome sequencing,whole exome sequencing,hybrid capture and amplicon capture technologies.Amplicon sequencing technology uses specific primers to amplify regions of interest to form an enriched DNA library,the simplification of the experimental procedure of amplicon capture technology has greatly reduced the professional threshold for operators,enabling more people to complete the experiment,and completely freeing up the risk of understaffing.Compared with amplicon sequencing technology,whole genome sequencing and whole exome sequencing is more expensive,the hybrid capture experiment is too complicated,and too many manual intervention steps may bring many uncontrollable factors to the experimental results,which is very fatal and not allowed for clinical.Currently,this technology has proven to be a fast,effective technology,and has played a unique role in next-generation high-throughput sequencing,and has produced many exciting discoveries.With the extensive application of multiple amplicon sequencing(MAS)in the detection of genetic variation,an effective tool is needed to remove the primer sequences of reads to ensure the reliability of downstream analysis.Although there are currently some tools,their efficiency and accuracy in removing large-scale primers for high-throughput target genome sequencing need to be improved.Given the potentialclinical applications of MAS in processing patient samples,this issue is becoming increasingly urgent.So we developed a tool that can handle thousands of primers simultaneously,greatly improving accuracy and performance.Our tool combines k-mers and Needleman-Wunsch algorithms,which can simultaneously handle primer sequences on reads in both “penetrated” and “unpenetrated” situations.The k-mers model allows mismatches on the primers,the hash table improves the speed of searching for primers,and the Needleman-Wunsch algorithm allows indels on the primers.In the search process,first use the k-mer model to find the primers.If the search fails,switch to a dynamic programming model to find the optimal primers.Therefore,even if there are sequencing errors and insertions and deletions on the primers,their accuracy can be guaranteed.Compared with similar tools,p Trimmer's sensitivity is improved by 28.59% and accuracy is improved by 11.87% ? The simulation data results show that compared with cut Primers(sensitivity is 70.85%,accuracy is 58.73%),p Trimmer has a sensitivity of 99.96% and an accuracy of97.38%.The performance of p Trimmer is also significantly improved.It is 370 times faster than cut Primers.,and even 17,000 times faster than cutadapt per thread.It takes only 37 seconds to remove 2158 pairs of primers from 11 million reads(Illumina PE150bp),and the memory consumption does not exceed 100 MB.We have developed both linux and Windows versions for the convenience of non-letters.The Linux version requires less installation and simple installation.p Trimmer is used to remove primer sequences from multiplex amplicon sequencing and target sequencing.Compared with three other similar tools.it has higher sensitivity and specificity,which can help users obtain more reliable mutation information for downstream analysis.
Keywords/Search Tags:primer, targeted sequencing, C language, k-mers, tool
PDF Full Text Request
Related items