Font Size: a A A

Design And Implementation Of Children's Picture Speaking System Based On Deep Learning

Posted on:2022-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:H B LiFull Text:PDF
GTID:2518306491955009Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the increasing number of images,it has become an important research content in the field of AI to let the machine automatically recognize and understand the image content,and describe the image content with text sentences that conform to people's reading habits.The main goal of Image Caption is to let the machine realize "look and say",which is also one of the main tasks of early childhood education.Through the Image Caption,children follow the machine to "look and say",stimulate children's interest in learning,and help guide children to understand.The research objective of this paper is to design and implement a picture reading and speaking system for children by using Image Caption technology,and apply artificial intelligence technology in the field of education to assist early childhood education.In order to achieve this goal,this paper studies the Image Caption technology.In the preparation stage,this paper selects the Chinese text description data set of AI Challenger image,divides the Chinese text into words,and uses Word2 vec to embed words into the annotation statements.In the model building stage,based on the deep learning method,this paper adopts the "encoder-decoder" method to build the model.Resnet50 is used to encode the image,and then the image coding information and the words of the annotation statement are embedded into the coding information and transmitted to the LSTM network to generate the description statement.In the training stage,this paper found that because the LSTM network receives the globally encoded information of the image at every moment,the utilization rate of the image information gradually decreases with the passage of time,and the generated predicted words are not accurate enough.To solve this problem,the attention mechanism is introduced in this paper.First,the encoded information of images and annotation statements is transmitted to the attention mechanism,which makes it process the weighted image encoded information,and then the processed encoded information is transmitted to the stacked LSTM network in time steps.Stacked LSTM network enhances the expression ability of the model by increasing the number of layers of the LSTM network model.This paper selects the three-layer stacked LSTM network model as the decoder of the model.In order to further improve the performance of the network model,the Smooth L1 loss function is used to optimize the network model to accelerate the convergence speed and avoid the gradient explosion problem.In this paper,several groups of comparative experiments are carried out,and the experimental results are analyzed to verify the effectiveness of the improved model from multiple perspectives.In this paper,based on B/S structure,using the Flask framework to build a picture reading and speaking system for children,the system calls the image Chinese text description model,to achieve the uploading of pictures,view automatically generated image description statements and other functions.Considering that children are less literate and like interaction,the system delivers text information describing images to children in the way of "talking" with the help of voice broadcast API,which meets the basic needs of children to look at pictures and speak,stimulates their interest in learning,and promotes their independent learning.
Keywords/Search Tags:Image Caption, Attention, ResNet50, Stacked LSTM
PDF Full Text Request
Related items