| With the development of Internet technology,the content on the Internet has exponentially exploded,and while people enjoy increasingly convenient content provision services,the large amount of redundant information has also brought many troubles to users.Similarly,in the field of education,the content of online educational resources is becoming more and more extensive,and students learn knowledge from a wider and wider range of sources,but as learning progresses,the problem of redundant information becomes more and more serious,which greatly wastes students’ learning time.In this paper,we construct a video course pictorial system based on Seq2 Seq structure by integrating speech recognition technology,named entity recognition technology,and group intelligence optimization technology to facilitate students to quickly overview the video content,precisely locate the key positions of the knowledge points to be learned,and improve students’ learning efficiency.The specific research of this paper is as follows.(1)To address the problem of generalization ability of speech recognition model,a Conformer speech recognition model,Conformer-R,based on R-Drop structure is proposed to enhance the generalization ability of the model by multiplex Dropout.The model is first pre-trained using Aishell1 and Wenetspeech datasets,and later fine-tuned using computer domain audio training data.Comparative tests are conducted on test_meeting and test_net test sets provided by wenet and test_ai test set provided by Aishell1,and better amount results are obtained.The model was fine-tuned using the teaching course data to achieve the expected results.(2)Combining the R-Drop structure with the XLNet pre-trained model and using the Transfomer encoder with relative position encoding for data encoding,the XLNetTransformer-R model is proposed to enhance the accuracy of the model for the named entity recognition task,and it is experimentally demonstrated that the XLNetTransformer-R on MSRA The F1 values of XLNet-Transformer-R are higher than the results of the model before improvement,and the performance is excellent when comparing the experiments with other three models.(3)A multi-spatial cooperative game particle swarm algorithm is proposed,using the speech recognition model batch_bins values as particles and the model loss values as the fitness values of the algorithm,and recalculating the batch_bins size after each epoch,and then optimizing the model batch_bins.experimental results prove that the optimized speech recognition model is more accurate on Aishell1 test set decreased the character error rate by 0.36%,which proved the effectiveness of the method. |