With the development of artificial intelligence and natural language processing,dialog technology has attracted extensive attention from academia and industry.In order to accurately identify users’ intentions and provide special services,task-orientated dialog system came into being.As the import module of the task-orientated dialog system,the dialog state tracking aims to extract the user’s intent and slot value from the user’s dialog history.However,the current research on dialog state tracking is devoted to modeling written text,which is quite different from the real dialog scene.In the real scene,when the user interacts with the system through speech,there may be background noise(car whistle,Gaussian white noise,etc.)in the complex environment where they are located.These background noises are mixed in the user’s voice.When the system converts the received voice into text,conversion errors will occur.These interferences and noises will lead to dialog system model failure.This paper focuses on the research of spoken dialog state tracking,proposes solutions,and develops a set of noise-reducing spoken dialog system.The contribution of this paper is summarized as follows:(1)This paper proposes a method for generating spoken noise text.By simulating spoken speech in real scenes,the transcribed MultiWOZ-ASR dataset with speech noise and ASR errors is constructed.Taking this dataset as the pre-training dataset,a multi-task pre-training model based on T5 is proposed,which makes full use of the correlation and complementarity between multiple tasks such as text error correction,slot value filling and intention recognition to improve the performance of complex spoken dialog state tracking.At the same time,the model only takes transcribed text as input,and is not restricted by additional speech and word knowledge.It is easy to extend to text-based dialog system.(2)This paper further studies the spoken dialog state tracking task with speech signal as input,and proposes an end-to-end dialog state tracking model based on multimodality.The model does not rely on additional ASR modules,eliminates the cascade error,and is easy to expand to more dialog systems and application scenarios based on speech.(3)Design and implement a spoken dialog system,and complete the system function and performance test.Compared with the traditional dialog system,the spoken dialog system designed in this paper directly receives the spoken language of users,and can effectively resist background noise or speech errors,with high robustness. |