| Microsatellite sequences,also known as Short Tandem Repeat(STR)or Simple Sequence Repeat(SSR),typically occur in the human genome as repeated units of 1-6 bp.Current research has found that microsatellites are associated with gene regulation,development and evolution,and various genetic diseases.Most of the current studies focus on the effects of microsatellites at disease-related local sites,or only very rough statistical analyses of the total length and overall relative density of microsatellites have been performed in the whole human genome.In previous studies,our laboratory found that microsatellites exhibit high density accumulation in certain positions in the human genome.Therefore,this study constructed the distribution maps of these specific high-density microsatellites accumulation peaks in the human genome,and developed a fully automatic mapping code based on Python 3 programming language.Based on the SSRs landscape map at 1 Kbp differential resolution of human reference genome GRCh38 established by DCM algorithm,a total of 3433 HDMA peaks were identified in this study.The relative density values of these peaks were 6 times and above that of the surrounding area,with obvious statistical significance,suggesting that they may have potential biological significance and deserve further study.In this study,all HDMA peak distribution characteristics and peak motif types in the GRCh38 genome were statistically analyzed,and on this basis,the distribution map of HDMA peaks on 24 chromosomes was constructed using PS.The map shows that the HDMA peaks were regularly arranged along each chromosome,and more than half of the HDMA peaks had the motif(AT)n.This suggests that the HDMA peaks may be an essential fundamental component in the human genome,whereas the(AT)n motif may be specifically selected as a very important element involved in constituting the structure and function of the genome.Meanwhile,data mining revealed that HDMA peaks may be involved in biological functions such as transcriptional regulation and genome structure.In order to improve the efficiency of map construction and realize the automatic construction of the distribution map of HDMA peaks on chromosomes,a map construction code based on Python 3 programming language is developed in this study,which greatly shortens the drawing time,verifies the previous manual maps.With the release of the first complete human genome T2T-CHM13 in 2021,this study used the developed code to rapidly construct maps of the distribution of HDMA peaks on the genome’s 23 chromosomes.Sequence comparison of HDMA peaks in the GRCh38 and T2T-CHM13 genomes showed that the genomic regions containing the HDMA peaks exhibited higher sequence variability between the GRCh38 and T2T-CHM13 genomes than the upstream and downstream regions.This study provides the first genome-wide view of HDMA peaks in the human genome and provides insights into the biological significance of high density microsatellites accumulation.At the same time,the code developed in this study also provides a powerful tool for rapidly constructing the maps of high density microsatellites accumulation at the genome level. |