In recent years,with the rapid development of computer technology,the concept of"people-oriented" has begun to emerge in the field of intelligent technology,and the use of Internet technology to improve people’s daily life has become a current research focus.Sign language is one of the most important methods for daily communication with deaf people.In order to promote communication between deaf people and other social groups,various sign language recognition technologies have attracted widespread attention of researchers.At the same time,sign language recognition technology plays an important role in the era of information interaction,and has important practical significance for in-depth research.In this paper,aiming at realizing sign language recognition for deaf and dumb people,through the research on image processing technology,it is improved and applied to the classification and recognition of video data,so as to improve the accuracy and speed of sign language recognition algorithm.In view of the shortcomings of the current sign language recognition algorithms,the focus is on the deaf-mute sign language recognition algorithm based on video key frames.The main research contents are as follows:In view of the current data sets in the field of sign language recognition for the deaf and mute,the types of data sets are cluttered,and the open source data sets that have not yet met the needs of communication sign language,combined with the characteristics of sign language gestures and movements,collect and process sign language video data.Based on the daily communication sign language vocabulary of deaf people,a video data set of deaf people communicating sign language composed of clipped videos is established to support the research and improvement of the sign language recognition method in this paper.Aiming at the problem that the recognition speed is affected by the large number of useless frames during video data processing,in order to meet the real-time and accuracy requirements of isolated word sign language video recognition,an artificial fish swarm algorithm is designed to improve K-means clustering.An algorithm-like keyframe extraction algorithm for sign language videos.Before the algorithm is executed,the sign language video needs to be preprocessed,the video is divided into single-frame images,and then the hand image is segmented.Singer chaotic map and random walk strategy are used to optimize the sparrow search algorithm to increase the diversity of the population and improve the local search ability.Then,the single frame image is subjected to two-dimensional Otsu segmentation to obtain the hand image with the optimal threshold segmentation..Aiming at the problem of insufficient description of gesture image features by conventional image features,HOG feature extraction is performed on the hand image,and on this basis,the K-means clustering algorithm is improved to complete the extraction of sign language video key frames,thereby reducing subsequent sign language.The identified computation time.The experimental results show that the video key frame extraction algorithm in this paper is more suitable for sign language videos,and the extracted key frame sequence is more representative.Aiming at the problem of gradient explosion in the training process of traditional convolutional neural network model of gesture data,the residual convolutional neural network is introduced into the Long Short-Term Memory(LSTM)network to improve the performance of the algorithm.Aiming at the problem of missing information during downsampling of the residual network,the bottleneck module of the residual network is reorganized to enhance the feature expression ability of the bottleneck structure.At the same time,the GRU module is used to optimize the LSTM network to improve the speed of network extraction of long-term spatial and temporal features in sign language data.The data set in this paper is verified,and the experimental results show that the sign language recognition method designed in this paper has relatively good performance in recognition speed and recognition effect. |