In the medical field,a large number of professional medical knowledge is stored in the form of text.Using natural language processing algorithm to analyze and process it can help researchers reduce the work pressure.Frontier trend prediction based on text analysis can provide reference for doctors’ future research direction planning.Extracting key information from literature can help doctors quickly grasp the core content of an arti-cle.The automatic question answering system can help the public to query simple medical problems without increasing the burden of doctors.Based on the above requirements,this paper applies natural language processing technology to text analysis of lung diseases to relieve the pressure of medical system.The paper have three main researches:For the prediction of the frontier trend in the field of lung cancer,through the clas-sification results of literature research direction combined with the publication time in-formation of literature itself,we can calculate the time series data of the literature pro-portion changing with time in this research direction,which represents the heat degree of the research direction.The emphasis is the construction of multi-label text classification network.Through the analysis of the samples,combined with the characteristics of the data,we propose an improved classification network.The experimental results show that our network has lower Hamming loss and higher F1 score than similar models,which are0.0463 and 83.87 %,respectively.Finally,this paper discusses the influence of different step size on the classification effect of time series prediction network.The experimental results show that when the step size is 6,the prediction effect is the best,and the rising and falling F1 scores are higher than 89 %.Based on the text classification task in the previous research,we extract the key in-formation from the literature related to lung cancer treatment.According to the different forms of information to be extracted in text,it can be divided into entities and triples.By improving the annotation mechanism,the joint extraction network of entity and triplet is constructed,which can avoid error accumulation and solve the problem of entity overlap-ping in the task of triple extraction.The experimental results show that the F1 score of this method is 74.36 %in entity extraction task,and 64.80 %in triplet extraction task.Compared with the pipline model,the overall effect is better.Finally,in order to show the extraction results intuitively,we wrote an automatic form filling program,which presents the extraction results of key information in the literature in the form of tables.The question and answer system of COVID-19 theme theme can help the public an-swer the scientific question about novel coronavirus,strengthen the public’s cognition of the epidemic,and facilitate the promotion of the national anti-epidemic work.Based on the actual needs,we constructed a question and answer data set of the new crown topic by means of template generation based on the existing knowledge graph of the new crown.Named entity recognition network and relationship detection network are trained respec-tively to realize the parsing of input questions.According to the output results of the question parsing module,the retrieval is carried out in the knowledge graph,and the re-trieval results are the replies to the input questions.Finally,in order to facilitate the use of future promotion,we design and build a set of QA system web interface based on the realization of QA function. |