Font Size: a A A

Document Clustering Analysis On Semi-supervised-related Medical Literatures

Posted on:2024-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y BiFull Text:PDF
GTID:2544307079491254Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the development of machine learning,neural network and medical technology,numerous of medical related documents have emerged.It is necessary to obtain information quickly from the vast documents,which has promoted the generation and development of natural language processing of medical related texts.The purpose of this paper is to mine the information from the literatures about the applications of semi-supervised statistical learning in the medical research.Upon english text clustering analysis,the massive information is extracted into subject words,and the subject words are extracted into dierent topics using LDA subject model,which helps researchers quickly understand the research direction of existing articles and the research methods used.This artical analyzed more than 1000 research papers published in Web of Science cencerning semi-supervised and medical up to Dec.6 2021 and selected the corresponding abstracts of the articles as objective.This review attempts to classify and discuss the research directions used in the medical field and the use of semi-supervised machine learning algorithms.After the pre-processing of the text as word removal,word stopping,part of speech restoration,and wrong word correction,generate the original corpus,and perform the vectorization representation for the original corpus,the "subject word" distribution probability matrix is generated.The obtained matrix is then subjected to K-means clustering,and the preliminary classification analysis is carried out according to collinear map and thermal map.The thermal map shows that there are five major categories,gene related research,cancer diagnosis and classification,fuzzy clustering,image processing,and brain disease.The text extraction of each major category is refined and analyzed.The LDA model is used to output the theme and subject words.LDA model analysis showed that there were 2 to 3 research directions for each type of literature extracted finally,the application of LDA model text extraction results can accurately mine the research topics in the cross-domain of medicine and semi-supervised algorithms,which is conducive to understanding the development trends and research hotspots of algorithms,and can provide reliable reference for relevant research.
Keywords/Search Tags:Text clustering, TF-IDF, Word2vec, TSNE dimensionality reduction, LDA topic model
PDF Full Text Request
Related items