| The third-generation sequencing technology has the advantages of ultra-long read length,uniform coverage and fast sequencing speed,which is conducive to comprehensive detection of deletion variants.However,existing structural variant detection technologies usually use a single algorithm to detect various types of variants,resulting in deletion regions with large variant lengths that are difficult to handle properly,with high precision but low recall,and many true variants are missed,making it difficult to meet the current demand.Therefore,research on high recall detection technologies is of great practical importance in the fields of genome annotation,precision medicine and clinical diagnosis.In this paper,we take the improvement of the recall rate of deletion variation detection as the entry point,and propose a method based on the dual-attention mechanism to analyze deletion variation data,DASV(dual-attention structural variation),with the main work summarized in the following three aspects:(1)Research on the mapping method of converting gene sequencing sequence alignment data into image data.The sequenced samples are compared with reference genes to form gene matching sequence information,which is stored in the form of text in a special format,and the information data are complicated and independent of each other.Aiming at the sequence information of the gene comparison,the mapping rules between all the comparison features of the sample and the image coding are established to realize the association between the features and the color distribution,shape and size,and color block arrangement of the image.The mapping conversion of features and images is carried out according to fixed base steps,focusing on the introduction of local attention mechanism for deletion variant sequence features to form gene mapping image information.This information converts one-dimensional data into two-dimensional data,reflecting the position relationship of sequences and comparison stacking information,and providing high-quality input samples for subsequent network training.(2)Based on the channel attention mechanism,the gene mapping image is applied to rescale the features of sequencing sequence samples,and the classification prediction of variant candidate regions is achieved by the residual network.In order to retain the complete information of gene sequences and accurately identify the deletion variants,a dual attention mechanism-based approach is proposed to fuse the mapping image with the sequencing sequence sample features.A local attention mechanism is used to strengthen the deletion features,and a channel attention mechanism is used to pool,convolve and sigmoid the mapped image features,and the results are dot producted with the sequenced sequence to achieve sample feature rescaling and obtain a sequenced sample with multiple features such as base fields,CIGAR fields and coverage depths for accurate classification and prediction of variant candidate regions.(3)Experimental validation of the method based on the dual attention mechanism and evaluation of its effectiveness.Real data(HG002,HG003,HG004,F1 offspring of Arabidopsis thaliana family samples)as well as simulated data with different sequencing depths and variation lengths were applied for the experimental validation of the whole process of gene sequence comparison,image data transformation,feature fusion of images and sequences,and classification prediction.The experimental results show that the DASV method can obtain higher recall with guaranteed detection precision compared with four currently popular detection tools,SVIM,sniffles,cute SV and PBSV,with an average increase of 3.5% in F1 score compared to the best of these tools,especially for data with high coverage depth of 13.2%,indicating that the method is more effective in calling deletion variants. |