Rice is one of the most important food crops in the world.It is also a model plant for gramineous crops and plays an important role in genetics,molecular biology and genomics research.The rice variety Nipponbare was completely sequenced by the IRGSP(International Rice Genome Sequencing Project)through the clone-by-clone method,of which reference sequence was improved multiple times.It is recognized as the highest quality crop genome sequence,while there are still a large number of sequence gaps and assembly errors.Complete and high-quality genomic sequences are the basis for related research on genomic function,molecular inheritance,and gene cloning.In this study,the rice variety Oryza sativa Nipponbare was used as the research material,and the Nipponbare genome was reassembled based on Single Molecule Real Time(SMRT)sequencing technology.Based on this,the existing gaps of Nipponbare reference genome sequence were filled to obtain higher quality rice Nipponbare genome sequences and better support the development of rice-related research.A total of 234 sequence gaps with>100 bp were found by searching the Os-Nipponbare-Reference-IRGSP-1.0 version reference sequence.A total of 6,011,680 sequencing reads were obtained by PacBio RSII SMRT sequencing of the Nipponbare genome,with average sequencing read length of 13,588 bp and sequencing depth of 54×.Using the different parameters of PBcR and Canu software to perform the pure three-generation sequence assembly of the reads of the sequencing,the results of the Canu assembly results were significantly higher than the PBcR assembly results.After that,the assembly result of Canu was selected and processed by Quiver,and a total of 2210 contigs were obtained with the N50 was 339,790 bp in length.The final assembly results were sequence-aligned with the IRGSP-1.0 reference genome for quality evaluation of the assembly results,showing that the assembled sequence covered 366,835,788 bp of the reference genome sequence,accounting for 98.28%of the reference genome,and the number of misjoins events was 240,representing that the contig sequence in the assembly result has 240 structural inconsistencies compared to the reference sequence,and the average base identity of the matching sequence is 99.95%.The vacancy sequence of the reference genome sequence>100 bp length was filled by the assemble result,50 vacancies were completely filled with adding sequence is 345,567 bp in length.,and 20 vacancy sequences were partially extended.GC,LTR,simple repeat and TRF content analysis of the newly added sequences in the sequence vacancy area showed that the GC content,LTR content and simple repeat content were significantly higher than the genome sequence.The full-filled sequence was verified by polymerase chain reaction(PCR)amplification and sequencing.A total of 22 sequences from the vacancy region or the flank portion of the gap-filled sequence were amplified.The correct match was 21(accuracy rate 95.5%,21/22).demonstrating the reliability of the SMRT sequencing genome assembly results used to fill the gaps in the reference sequence.The results of SMRT sequencing-based genome assembly for the filling of reference genome sequences can provide reference for the improvement of other genomes.The reliable new sequences can be used for subsequent research on genomic function mining.experimently... |