Dialogue systems for automatic assisted diagnosis and treatment have attracted the attention of industry and academia because they can talk to patients,obtain patient symptoms and automatically diagnose,thereby further simplifying the diagnosis process and reducing the cost of collecting patient information.However,the small number and poor quality of existing dialogue datasets have brought more severe challenges and requirements to dialogue systems for automatic assisted diagnosis and treatment.The current research on dialogue systems for automatic assisted diagnosis and treatment does not integrate the context information of the dialogue well,and the reasoning process of the relationship between symptoms of the same disease cannot reflect the real diagnosis situation.In view of the above problems,this paper focuses on the key technologies of the dialogue system for automatic assisted diagnosis and treatment,and conducts research on the problems.The main research contents are as follows:(1)In view of the lack of corpus in the dialogue system of automatic assisted diagnosis and treatment,this paper screens out the available data from a large number of unlabeled dialogue data,and constructs a data set for multiple rounds of dialogue.This article refers to the structure of the CCL2021 dialogue evaluation data set,and further formulates the corresponding annotation specifications and develops corresponding annotation tools.Under the guidance of relevant medical professionals,in accordance with the labeling specifications,with the help of medical field knowledge and related technologies such as word segmentation and named entity recognition,the multi-round dialogue data set based on the CMedQA data set was marked,and the multi-round dialogue data set was constructed and released,so as to facilitate subsequent research and analysis.(2)Aiming at the multi-task processing problem of the natural language understanding module in the dialogue system of automatic assisted diagnosis and treatment,this paper proposes a pipelined multi-task heterogeneous graph network model for natural language understanding.The model uses a pipeline method to complete the tasks of medical named entity recognition,medical named entity standardization,and positive and negative judgment of symptoms,so that the downstream tasks can use the knowledge of the upstream tasks,and let the labels generated by the upstream tasks be used as auxiliary identifiers to be embedded in the input of the downstream tasks.,and further generate a heterogeneous graph of reasoning to optimize multi-round dialogue policy learning.Compared with the two baseline models,the F1 value of the proposed model is 5% and 3%.(3)Aiming at the problems of large number of dialogue rounds,low disease diagnosis accuracy and low symptom matching rate in the dialogue system of automatic assisted diagnosis and treatment,this paper proposes an inference graph network model for multi-round dialogue strategy learning.This model is used in the dialogue management module of the dialogue system for automatic assisted diagnosis and treatment.The symptom discrimination module decides whether to perform reinforcement learning generation or BM25-based disease screening.The predicted symptom information is jointly determined by a deep Q-network based reinforcement learning network,a heterogeneous graph reasoning-based symptom generation network,and a symptom-to-symptom external knowledge graph.The symptom matching rate,disease accuracy rate and dialogue rounds are optimized through the BM25-based disease matching module.A large number of experiments show that the model proposed in this paper not only improves the disease accuracy and symptom matching rate by 2% and 9%respectively compared with the baseline model in the dialogue system for automatic assisted diagnosis and treatment,but also reduces the average number of dialogue rounds.0.4 rounds. |