Document Clustering Analysis On Semi-supervised-related Medical Literatures

Posted on:2024-09-19

Degree:Master

Type:Thesis

Country:China

Candidate:Q Y Bi

Full Text:PDF

GTID:2544307079491254

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

With the development of machine learning,neural network and medical technology,numerous of medical related documents have emerged.It is necessary to obtain information quickly from the vast documents,which has promoted the generation and development of natural language processing of medical related texts.The purpose of this paper is to mine the information from the literatures about the applications of semi-supervised statistical learning in the medical research.Upon english text clustering analysis,the massive information is extracted into subject words,and the subject words are extracted into dierent topics using LDA subject model,which helps researchers quickly understand the research direction of existing articles and the research methods used.This artical analyzed more than 1000 research papers published in Web of Science cencerning semi-supervised and medical up to Dec.6 2021 and selected the corresponding abstracts of the articles as objective.This review attempts to classify and discuss the research directions used in the medical field and the use of semi-supervised machine learning algorithms.After the pre-processing of the text as word removal,word stopping,part of speech restoration,and wrong word correction,generate the original corpus,and perform the vectorization representation for the original corpus,the "subject word" distribution probability matrix is generated.The obtained matrix is then subjected to K-means clustering,and the preliminary classification analysis is carried out according to collinear map and thermal map.The thermal map shows that there are five major categories,gene related research,cancer diagnosis and classification,fuzzy clustering,image processing,and brain disease.The text extraction of each major category is refined and analyzed.The LDA model is used to output the theme and subject words.LDA model analysis showed that there were 2 to 3 research directions for each type of literature extracted finally,the application of LDA model text extraction results can accurately mine the research topics in the cross-domain of medicine and semi-supervised algorithms,which is conducive to understanding the development trends and research hotspots of algorithms,and can provide reliable reference for relevant research.

Keywords/Search Tags:

Text clustering, TF-IDF, Word2vec, TSNE dimensionality reduction, LDA topic model

PDF Full Text Request

Related items

1	Design And Implementation Of Disease Analysis System Based On LDA Topic Model
2	Research And Application Of Chinese Medicinal Materials Patent Text Mining Method Based On Topic Model
3	Study On Topic Model And Its Application To TCM Clinical Diagnosis And Treatment
4	Research On Medical Intelligent Question Answering Algorithm Based On LSTM&Topic-CNN Model
5	Dimensionality Reduction And Clustering Ensemble Of Tumor Gene Expression Profile
6	Dimensionality Reduction Analysis Of Gene Chip Data Of Acute Myeloid Leukemia
7	The Research Of Colorectal Cancer Risk Prediction Model Based On Dimensionality Reduction And Regression Analysis
8	Text Mining Of Attitudes Toward Depression On Chinese Social Media
9	Brain Cognitive Research On Topic Model
10	Dimensionality Reduction Of Breast Cancer Patientsâ€™ Clinical Data And Survival Prediction Analysis