| Document clustering is one of the fundamental tasks in the field of data mining,which divides documents into clusters by measuring information such as the similarity between data.With the development of diverse application scenarios,users have different preferences for clustering results,which is called user intention.User intention is an abstract concept.User intention not only helps to determine the direction of clustering partitioning,but also helps to provide positive guidance during the clustering process and improve clustering performance.Therefore,it is necessary to consider user intention in document clustering.However,there are two key issues in document clustering considering user intentions,the feature representation of documents under different intentions is focused,and how to learn user intentions and obtain corresponding feature representations of intentions is one of the key issues.The supervision information given by users is small,and how to learn the overall user intent is another key issue.In response to the above issues,this article proposes Intentionenhanced semi-supervised deep document clustering(IEDC)model and Semisupervised deep document clustering based on global intention(SCGI)model,respectively.In order to achieve the goal of mining user intentions and obtaining focused document features guided by intentions,IEDC has designed an intention enhancement automatic encoder that uses this encoder to learn the enhanced document feature representation of intention information,thereby ensuring that the document feature representation not only learns the internal features of the document,but also learns supplementary information such as the focus of user intention representation.IEDC has also designed an intention-oriented semi-supervised clustering module that allows users to participate in the document clustering process,ensuring the accuracy of document clustering results.In response to the challenge of learning global user intentions from a small amount of supervised information,SCGI has designed a deep metric learning network to explore the global intentions of users.SCGI uses the deep metric learning networks to learn feature representations that contain users’ global intentions and a global intention matrix.The global intention matrix can guide the document clustering process to more conform to the user’s overall grouping preferences for document,thereby obtaining clustering results guided by the user’s global intention.In this paper,compared with multiple semi-supervised clustering methods on real datasets,extensive validation has been conducted.The results showed that the clustering performance of both models was higher than that of the comparative experiment,with an average improvement of 5.2% and 7% in NMI evaluation indicators,respectively,proving the effectiveness of the proposed model in this article. |