| The intelligent question answering system can understand and answer the questions asked by the user to a certain extent,thus solving the problem of traditional document retrieval only sorting without giving accurate answers.At present,the question answering system based on knowledge base is time-consuming and labor-intensive due to the differences in the knowledge base field,the large scale and the difficulty of building knowledge base.The searchable question answering system can avoid the above problems,the searchable question answering system first compares the user’s question with the document in the document dataset,selects the highly similar document,and enters the question and document into the machine reading comprehension model to obtain the answer to the question.Although there are a large number of questions and answers in Baidu Know,not every question has corresponding answers,and how Baidu automatically answers unanswered questions is an important research task at present.In order to solve this problem,this paper proposes an intelligent question answering model based on document retrieval and machine reading comprehension for the research of Baidu’s intelligent answer algorithm.First,for Baidu’s Chinese reading comprehension dataset Dureader,this paper constructs a machine reading comprehension model based on BERT,which first reconstructs the questions and documents using the sliding window method,trains the model as a training set,and inputs them to the BERT embedding layer for feature extraction.Finally,by converting the final hidden state of BERT into the probability of the answer span by fully connected layer and the softmax function,the output of the start position and end position of the answer can be obtained.The BERT-based reading comprehension model proposed in this paper achieves ROUGE-L of 0.425 and BLEU-4 of 0.477 on the Chinese reading comprehension dataset Dureader,which is 10.1 and 8.4 percentage points higher than that of the baseline model Bi DAF.Second,for the generation of documents,this paper constructs a document retrieval model based on named entity recognition and word vector technology,uses BERT-CRF to train the named entity recognition model to obtain the entity annotation of the input problem,and then retrieves the document according to the entity.If the retrieval of documents based on entities fails,word vector technology is used to obtain the approximate words of the entities,and then the documents are retrieved again,which is conducive to improving the accuracy of document retrieval.Finally,the user-entered questions and retrieved documents are input into the reading comprehension model to predict the answer,and finally the answer to the question is generated. |