| With the rapid development of the Internet,online video learning has become an important way for people to learn independently.However,many open teaching videos do not provide electronic documents of courseware,which brings great inconvenience to learners.To this end,this thesis designs and implements a CNN-based online teaching video courseware content extraction system,which can convert the courseware page in the video into an editable electronic document for users to use.The system is mainly divided into four modules: video frame extraction,page object analysis,text recognition and file generation.The video frame extraction module captures screen pictures from online video and processes them into courseware pages;the page object analysis module divides the acquired courseware pages into different object areas and classifies them as one of picture,table,formula and text;The text recognition module uses the open source CRNN model to recognize the text area pictures as text;the file generation module uses the python-docx library to generate an editable docx document according to the original layout of the courseware.The algorithms of the page object analysis module are divided into segmentation algorithms and classification algorithms.For the segmentation algorithm,this thesis proposes a projection-based RLSA algorithm based on pattern matching.The algorithm first projects the input picture horizontally or vertically,and uses the one-dimensional RLSA algorithm to iteratively analyze the projected binarization result and determine the segmentation point and gets a good segmentation effect.The classification algorithm proposes a two-channel hybrid convolutional network,which trains one-dimensional and two-dimensional Alex Net networks as feature extractors,and uses a three-layer two-channel fully connected layer for final classification.The network achieved a classification accuracy of 98.02% in the ICDAR2017 POD data set,which is the best classification effect based on the deep network method.For the implementation of the system,the video frame extraction module uses the corresponding pixel subtraction and pixel projection methods to perform repeated frame detection,black border removal,and courseware page discrimination.In the page object analysis module,using a client-server architecture,the classification model is deployed on the server,and the client obtains the classification results by transmitting the regionalpictures to the server.The text recognition module performs projection-based word segmentation on the text area to add spaces between words.A series of layout information parameters are designed in the text generation module,and the original layout of the courseware is restored accordingly.The system has achieved a good recovery effect on the less complex layout of the light background courseware page,and realized a complete CNN-based online teaching video courseware content extraction system. |