| Currently,both task-oriented human-machine dialogue systems and chat bot can only make response according to knowledge bases,and they can not increase their knowledge in the process of communication with people.An important feature of intelligence is the ability to learn and increase knowledge through natural interactions of multiple modalities.Therefore,it is very necessary to carry out research on intelligent learning system based on multi-modal information such as vision and language.To this end,the paper carried out the following work.Design a multimodal teaching dialogue system.System includes image recognition,spoken language understanding,dialogue management’and natural language generation module.Image recognition module based on the existing deep neural network to obtain the image features,and use support vector machine and k-nearest neighbor method for attribute and object recognition;Spoken Language understanding module based on classification and sequence labeling model of the dialogue intent and slot tagging;Dialogue Management Modeling with Finite State Automaton.The system can associate image modality information with vocabulary information.The system also includes a knowledge base that can be expanded and enriched through human-computer interaction.Based on the above design,a multimodal teaching dialogue system is implemented on the Android tablet.The system learns the object’s name,color,shape,type and other knowledge through multi-modal dialogues with human beings.The user can also inquire the system about knowledge it learned.The experiment shows that the multimodal teaching dialogue system constructed in this paper can effectively acquire knowledge through multimodal human-machine dialogue. |