With the development of the information society and the advent of the Web2.0 era,data traffic is increasing exponentially,it has become more difficult to find reliable information from massive data.Thanks to the rapid development of new-generation technologies such as big data and artificial intelligence,the scientific domain question answering system is highly expected.The domain question answering system converts the user's natural language questions into structured query sentences,obtains the relevant information through the triples in the domain knowledge base,and answers the questions in an accurate and concise natural language,which can effectively eliminate the user's knowledge anxiety.Due to the traditional domain question answering system has the disadvantages of complicated rule making and cumbersome feature engineering,applying deep learning technology to the domain question answering system has become one of the hotspots of natural language processing research.In view of the above situation,this thesis studies the several key technologies of the domain question answering system based on BiLSTM(Bidirectional Long-Short Term Memory)networks.The main works include:(1)For the statistical machine learning methods rely on feature engineering and the lack of semantic information after convolutional neural network pooling,this thesis proposes a B-CNSR question classification model.The B-CNSR model combines word vectors and part-of-speech features to obtain a distributed representation of the question,the BiLSTM networks and the capsule networks combined with static routing algorithm are used to extract the context timing information and local feature information of the text.The experimental results show that the model proposed in this thesis has a better question classification effect.(2)Due to the problems of polysemy and lack of obvious boundaries in Chinese sequence labeling tasks,this thesis proposes a Bw-BC entity slot filling model.The Bw-BC model uses the BERTwwm pre-trained language model with context word information to dynamically generate word vectors,and uses the BiLSTM networks combined with the CRF algorithm to extract the contextual timing information of the text and the relationship information between adjacent labels.The experimental results show that the proposed model has improved the recognition effect.(3)This thesis develops a Web question answering prototype system for medical domain based on the above two models.The system performs question classification and entity slot filling in natural language questions entered by users,and generates answers online.After the trial running test,the prototype system can achieve the above functions stably,and the next step is to expand the domain knowledge base and improve the system. |