With the rapid development of digital library construction and the growth of digital journal collections,a large amount of academic full-text data is stored in the cloud server in digital form,which is characterized by a large number and easy access.Besides,academic search engines play an important role in the scientific research tasks of academic users.As the research community becomes larger and larger,retrieval models that can understand the intent of academic queries have important research significance,which can be widely used in retrieval and recommendation tasks.We proposed a classification system of academic user query intentions,based on the search logs of real academic users,and integrate the academic user query intentions into the academic full-text retrieval to improve the performance of the academic full-text retrieval model.The main contents are as follows:1)It proposed a query intention classification system for academic user retrieval.Using K-Means++ clustering algorithm and manual induction,a classification system of academic query intentions is proposed,which is divided into research academic literature intent,practical academic literature intent,entity-oriented intent and question-and-answer intent.We have organized 10 postgraduates of information science to label 10,000 queries based on the classification system to verify the label consistency.The average Kappa coefficient is 0.805.The results show that our query intent classification system is effective in describing academic user queries.2)Incorporating features of academic named entities to improve the performance of automatic classification of academic user query intentions.Academic named entities have finer granularity and can reflect the inherent characteristics of the academic full-text.In addition,the second fine-tuning of the pre-trained model on the corpus of task filed can learn better text representation.Based on this background,an academic entity annotation system was constructed,and a pre-training model Intel-BERT oriented to the field of Information Science was trained to compare the performance of CRF,LSTM-CRF and BERT on academic name entity recognition task.Finally,based on word number features,in-word features and academic entity features,SVM,LSTM,Text CNN,BERT and intel-BERT were used to classify the query intents.In terms of academic entity recognition,Intel-BERT has a better recognition effect with an F1 value of 84.48%.Text CNN has the best performance in terms of academic user query intent,with an macro-average F1 value of 96.14%.The results show that the integration of academic entity features can improve the classification performance of academic users’ query intentions effectively.3)It proposed an academic full-text retrieval model that incorporates the entity-oriented query intention of academic users.Incorporating user intent into the retrieval process can improve the performance of the retrieval model,but there is no relevant verification in the academic user retrieval scenario with corpus limitations.Thus,we proposed a query-sentence intent matching model based on the attention mechanism,and applied this feature to the retrieval model,then compared the performance differences of retrieval systems incorporating language model features and intent features.Finally,the average F1 value of the entity intent matching model is 86.87%.Besides,It is verified that the retrieval system that incorporates language model features and intention features has the best performance,with NDCG@10 reaching 0.868,and the ablation experiment shows that the lack of either the language model feature or the intention feature will affect the final effect. |