As a branch of natural language processing,machine reading comprehension has attracted more and more attention in recent years.With the old evaluation data set being conquered constantly,a large number of high-quality data sets have emerged in the field of reading comprehension,and the continuous development of in-depth learning has helped researchers create more and more excellent reading comprehension models.Nevertheless,there are still some problems in the field of machine reading comprehension:(1)There are few related studies on how different types of data affect each model,and which model works best under each data type;(2)There is a lack of research on data set integration,and most of the studies only focus on individual data sets,rather than integrating multiple data sets;(3)Current models are varied,but there is no reference for which model works well in specific question/answer types.In the face of a large number of reading comprehension models,researchers often do not know how to choose.In view of the above problems,this paper mainly carries out research in three aspects:(1)Four machine reading comprehension data sets,including SQUAD,MARCO,NewsQA,NarrativeQA,are integrated into a big data sets,which ensures the diversity of text content.Inspired by the data set classification methods that have an impact on model's performance in the relevant literature,the data sets are divided into 8 groups according to the type of problem,and 6 groups according to the type of answer,the Classified data set is used for follow-up research of this paper.(2)Mainstream machine reading comprehension models such as Match-LSTM,BiDAF,R-NET,Mnemonic Reader and Document Reader are constructed.The benchmark models are retained,and the structure of all benchmark models is modified in diffirent way.At the same time,some benchmark models are optimized: adding gate mechanism for Match-LSTM to extract accurate attention vectors;adding highway networks for BiDAF benchmark models to avoid the disappearance of gradient in the process of back propagation.Thirty-two models are constructed in this experiment,which shows that the data set partitioned according to our method has different effects on the performance of machine reading comprehension models.(3)A model analysis system of machine reading comprehension is built by Web development,which includes reading comprehension answering module and model analysis module.The reading comprehension answering module integrates the advantages of various models and can analyze the types of questions input by users and call the best model to answer.The model analysis module allows users to configure various mainstream models manually,and the system can returns the performance evaluation of the model on different types of data. |