Font Size: a A A

Research On The Application Value Of Text Classification Technology On Literature Screening And Qualitative Research

Posted on:2020-11-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y JingFull Text:PDF
GTID:1364330590466411Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Purpose: With the development of machine learning technology,the increasing of human's needs(image recognition,audio transcription,personalized product production,etc.),and a series of methods of deep learning technology are slowly entering the human's eye.Deep learning could overcome the shortcomings of machine learning that can't handle the real-world data of the original form.By combining multi-layer neural networks,deep learning could extract useful information from the original form of data.Under the guidance of a repeatable search strategy,the objective and comprehensive process of document retrieval was the key to the quality of the evidence provided by the systematic review,but the literature screening process was also the most time-consuming and laborious work in the systematic review.The research methods of qualitative research are observation,interview,recording or transcript of the research objects.Even if the interview recording,after finishing and analyzing the data,it still needs to be organized into the text form,which will generate a large number of text files.During the data analyzing,the data forms for analyzing were manual coding and induction,and the large amount of human consumption was also one of the difficulties in data analysis of qualitative research.In summary,in order to save the human resources of literature screening and qualitative research data analysis,this study aims to explore the application possibility of text classification technology in systematic review and meta-analysis literature screening and qualitative research data analysis.Materials and Methods: The data sources for this study were data from a systematic review and meta-analysis published by our team,as well as data from a qualitative study named “City Changing Diabetes(CCD)-Tianjin”.In this study,Text CNN algorithm was used to classify texts.For Chinese text classification,this study superimposes the "jieba" Chinese word segmentation on the basis of Text CNN algorithm.The algorithm was built by a personal computer(PC)stand-alone.The i7-6700 3.4GHz was the computer configuration central processing unit(CPU)used in this study.The running memory was 16 GB.The Nvidia Geforce GTX 1050 Ti 6G memory version was the graphics card(GPU)used in the data processing,the operating system used in this study was the Ubuntu 18.04 LTS system,which was based on the UNIX architecture.This study used Python 3.6.8 to write and modify all the algorithm code.Results: For the English text,according to the sample size ratio of the two types of literature in the training sample,there were three sets of Text CNN models construct in this study,and the third set of models(training sample 240vs63)was found for a few categories("included" category).The classification accuracy rate was the highest in three groups(86.67%,52/60),and this was only the initial screening process of the literature,so in practical applications,the probability of information loss was the lowest.But the disadvantage of the third group model was that the classification result of most categories("excluded" categories)is not good,which leads to more workload of subsequent literature screening work(WSS 95% lowest).But relative to the "excessive information loss" defect,increasing the screening workload was obviously more acceptable,and the third model has already reduced the initial screening workload by a half.Therefore,in practical applications,this study considers the third group to be the best one.That is,in the sample,the sample size of a small number of samples was expanded to eight times.However,if we continued to expand the sample size of a small number of samples,the classification effect of the model will eventually become extremely poor for most types of samples,that is,the goal of saving workload is not achieved.For the Chinese text,although the F1 value of the optimal model of this study is lower,but the overall correct rate reached 61.88%,and the WSS95% reached 0.5813,compared with the study when the Text CNN model is first applied to Chinese text classification.The classification accuracy rate has increased by more than 7 percentage points,and the study splits the training samples and test samples according to the ratio of 3:7.The advantage of this is that in practical applications,the researcher only needs to complete 30% when analyzing qualitative data.And remaining 70% of the text can be processed by the computer,which greatly reduced the workload of the data analyst.Conclusion: The text classification technology can provide powerful technical support for the initial screening process of systematic review and meta-analysis and qualitative research data analysis.The Text CNN model has better classification ability for English text than Chinese text.
Keywords/Search Tags:Text classification, TextCNN, Literature screening, Qualitative study, Application study
PDF Full Text Request
Related items