| With the continuous development of internet technology,the amount of data obtained through various electronic media has also exploded.How to retrieve the information that users really need from massive data has become a research hotspot.In the medical field,the development of information retrieval technology has also received extensive attention.The retrieval of medical query related information on a large number of medical literature and research can help doctors make better decisions and promote the intelligent development of medical diagnosis.TREC CDS(Clinical Decision Support)is one of the applications for medical information retrieval.The clinical decision support system task is developed to retrieve biomedical articles related to general clinical questions about medical records to meet the needs of physicians.The queries for clinical decision support system task are short medical records,which are often described as challenging medical cases and usually organized into a understandable narrative.Although it is a short medical record report,compared with the length of each query in a common Web search engine,the query length of the clinical decision support system task is still long,which also leads to problems such as ambiguity and unclearness in queries.A query with a clear need is a guarantee of the quality of the search.Query expansion is one of the means to effectively solve such problems.Therefore,this paper studies the query expansion in the medical field.Choosing the appropriate number of expanded words as an expansion of the query is one of the common problems on query expansion techniques.There are two traditional methods,one is to select a fixed number of words as an expansion based on experience,and the other is to set a threshold.Only when the words with similarity score larger than the threshold can them be used as expansions.But for the query,different queries should have different numbers of expansion words,the fixed number of expansion words and setting thresholds will lose meaningful expansion words orincrease meaningless expansion words to some extent,and the above two methods do not take the relationship between the extension words into account.So one of the innovations of this paper is to choose the optimal expanded entity group instead of the expand entity,which is obtained by combination of the candidate extended entity sets.This approach not only solves the problem of fixed number of expanded entities,but also considers the relationship between expanded entities.From the research of query expansion technology,it can be found that the previous methods are mainly based on the bag of words and the topic model.Although these methods have achieved effects in recommendation,search,etc.,the manual extraction of features is not only expensive,but also requires experienced engineers to design,and the relationship between query and expansion words can no longer be considered semantically.Therefore,the second innovation of this paper is to propose a neural network-based selection model.The neural network can automatically extract features and the model can calculate the similarity between the query and the expanded entity at the semantic level,then select the optimal medical entity group as the expanded of the query.Considering the domain of medical text query,this paper selects MeSH for semantic mapping and obtains a set of candidate expanded entities.Due to the difficulty and high cost of medical text annotation,this paper uses the thoughts of migration learning to learn the relationship between sentences and entities in the non-medical field,and applies it to the medical field to predict the similarity between medical text queries and medical entity combinations.It is also the third innovation of this paper.Especially in the 2014 and 2016 data sets,the values of the evaluation indicators P@10 and NDCG@10 have significantly improved. |