Research On The Application Value Of Text Classification Technology On Literature Screening And Qualitative Research

Posted on:2020-11-01

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X Y Jing

Full Text:PDF

GTID:1364330590466411

Subject:Epidemiology and Health Statistics

Abstract/Summary:

PDF Full Text Request

Purpose: With the development of machine learning technology,the increasing of human's needs(image recognition,audio transcription,personalized product production,etc.),and a series of methods of deep learning technology are slowly entering the human's eye.Deep learning could overcome the shortcomings of machine learning that can't handle the real-world data of the original form.By combining multi-layer neural networks,deep learning could extract useful information from the original form of data.Under the guidance of a repeatable search strategy,the objective and comprehensive process of document retrieval was the key to the quality of the evidence provided by the systematic review,but the literature screening process was also the most time-consuming and laborious work in the systematic review.The research methods of qualitative research are observation,interview,recording or transcript of the research objects.Even if the interview recording,after finishing and analyzing the data,it still needs to be organized into the text form,which will generate a large number of text files.During the data analyzing,the data forms for analyzing were manual coding and induction,and the large amount of human consumption was also one of the difficulties in data analysis of qualitative research.In summary,in order to save the human resources of literature screening and qualitative research data analysis,this study aims to explore the application possibility of text classification technology in systematic review and meta-analysis literature screening and qualitative research data analysis.Materials and Methods: The data sources for this study were data from a systematic review and meta-analysis published by our team,as well as data from a qualitative study named �City Changing Diabetes(CCD)-Tianjin�.In this study,Text CNN algorithm was used to classify texts.For Chinese text classification,this study superimposes the "jieba" Chinese word segmentation on the basis of Text CNN algorithm.The algorithm was built by a personal computer(PC)stand-alone.The i7-6700 3.4GHz was the computer configuration central processing unit(CPU)used in this study.The running memory was 16 GB.The Nvidia Geforce GTX 1050 Ti 6G memory version was the graphics card(GPU)used in the data processing,the operating system used in this study was the Ubuntu 18.04 LTS system,which was based on the UNIX architecture.This study used Python 3.6.8 to write and modify all the algorithm code.Results: For the English text,according to the sample size ratio of the two types of literature in the training sample,there were three sets of Text CNN models construct in this study,and the third set of models(training sample 240vs63)was found for a few categories("included" category).The classification accuracy rate was the highest in three groups(86.67%,52/60),and this was only the initial screening process of the literature,so in practical applications,the probability of information loss was the lowest.But the disadvantage of the third group model was that the classification result of most categories("excluded" categories)is not good,which leads to more workload of subsequent literature screening work(WSS 95% lowest).But relative to the "excessive information loss" defect,increasing the screening workload was obviously more acceptable,and the third model has already reduced the initial screening workload by a half.Therefore,in practical applications,this study considers the third group to be the best one.That is,in the sample,the sample size of a small number of samples was expanded to eight times.However,if we continued to expand the sample size of a small number of samples,the classification effect of the model will eventually become extremely poor for most types of samples,that is,the goal of saving workload is not achieved.For the Chinese text,although the F1 value of the optimal model of this study is lower,but the overall correct rate reached 61.88%,and the WSS95% reached 0.5813,compared with the study when the Text CNN model is first applied to Chinese text classification.The classification accuracy rate has increased by more than 7 percentage points,and the study splits the training samples and test samples according to the ratio of 3:7.The advantage of this is that in practical applications,the researcher only needs to complete 30% when analyzing qualitative data.And remaining 70% of the text can be processed by the computer,which greatly reduced the workload of the data analyst.Conclusion: The text classification technology can provide powerful technical support for the initial screening process of systematic review and meta-analysis and qualitative research data analysis.The Text CNN model has better classification ability for English text than Chinese text.

Keywords/Search Tags:

Text classification, TextCNN, Literature screening, Qualitative study, Application study

PDF Full Text Request

Related items

1	Knowledge Acquisition And Application For Clinical Texts
2	Application Of Text Mining In Drug Active Gene Screening And Case Study On Rapamycin
3	Design And Implementation Of Systematic Review Citation Screening System Based On Literature Similarity
4	Biomedical Text Mining And Its Application In Gene Regulatory Information Analysis
5	Research On The Key Techniques Of Biomedical Text Mining
6	Application And Research Of Emotion Classification Technology Based On Deep Learning For MSM Emotion Classification
7	Research On Classification Method Based On Acupuncture Text Data
8	Text Analysis And Application Of Lung Diseases Based On BERT Semantic Embedding
9	Research On Key Technologies Of Medical Assisted Diagnosis Based On Deep Learning
10	Application Of Health Management Knowledge Classification In Cardiovascular Patients