Font Size: a A A

The Research Of Scene Understanding Neural Network Model

Posted on:2020-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:T HuiFull Text:PDF
GTID:2428330590954674Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Objective Automatically generate description of image scene content is a hot issue in the field of computer vision,and researching on this problem could help machines understand images preferably.Methods This paper focuses on the two problems which are how to generate a sentence that conforms to Chinese grammar rules and how to generate a description using a typical text poster when the image content was scarce.They were proposed based on Faster R-CNN,CNN and LSTM respectively:(1)Image scene description combined with Chinese part of speech classification: First,Images were collected and attached a Chinese description to each one.All the sentence contents are counted by classifying quantifiers,verbs,nouns and scenes.The three types of parts of speech correspond to the attributes,states and categories of objects in the image respectively and the scene class corresponds to the image background,then the transfer learning samples of part of speech and scene samples which can reflect the meaning of words are constructed to form the Chinese description dataset.Then,the CNN of part of speech classification was obtained by using the transfer learning sample,and the corresponding part of speech detector was obtained by training the Chinese description dataset with the algorithm of area detection network on the basis of this,and the scene classifier was obtained by training the scene recognition sample with the CNN.Finally,the part of speech labels and scene labels contained in each sentence were extracted,and the labels containing a variety of similar parts of speech were fully arranged to obtain a multilabels-sentence matching table.The LSTM network was built for matching the table,meanwhile,the process of word order matching,numeral and preposition memory was completed.A scene description model is constructed by combining the above detectors,classifiers and LSTM network.Results The multilabels-sentence matching accuracy of LSTM network reached 100%.The comparison experiment showed that the accuracy of each detector after transfer learning was improved by 2.62%,3.56% and 3.16% respectively and the mAP was improved by 3.11%.After optimization,the recognition rate of scene network was improved by 5.67%.(2)Image scene description combined with typical text poster: First,the printed Chinese character dataset was built and the CNN was trained to classify Chinese characters.Then,a typical text poster dataset was built and the transfer learning was completed by the character classification network for classify the similar topic posters.Finally,a LSTM network was built to achieve the mapping of multiple words to sentence descriptions and an image scene description model combined with CNN and LSTM was implemented.Results The results show that the CNN with Chinese character classification ability is more targeted in transfer learning of typical text poster.Respectively,the accuracy rate is increased by 5.74% and 3.74%.The accuracy of words-to-sentence in LSTM and image scene description model are 100% and 97.23%.Conclusion From the perspective of local understanding,this paper focuses on the description of syntactic rules of Chinese generative symbols and the generation of description based on typical text poster when the image content is insufficient.Based on the neural network algorithm,the experimental method deconstructs the image content by means of small-scale training,and completes the two processes of reification from the word to the image content and abstraction from the image content to the word.
Keywords/Search Tags:part of speech classification, target detection, convolutional neural network, long-short term memory network, scene description
PDF Full Text Request
Related items