Font Size: a A A

Research On Text Quality Classification Based On Deep Learning

Posted on:2021-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q MoFull Text:PDF
GTID:2428330620464282Subject:Engineering
Abstract/Summary:PDF Full Text Request
Wikipedia is a large-scale document for human beings,an online resource that modern people rely on as sources of information,but was once questioned by the level of quality.As more and more editors were added,manual evaluation would not be in line with reality.And modifications may lead to low-quality articles(or even misinformation).Regarding quality issues,there are millions of college students and graduate students graduating every year.The quality of dissertation varies,and colleges and universities have raised higher and higher requirements for graduation thesis.Classification of graduation thesis quality is also a challenging task.In the field of long text quality classification,the use of deep learning methods to achieve automatic text quality classification has been mainly studied in English Wikipedia,Chinese Wikipedia,major graduation thesis,and the main work is as follows:1.For the problem that cross entropy only focuses on the correct classification,an improved cross entropy function is proposed in this thesis,which allows the model to better fit the data distribution.2.To solve the problem of text quality classification in English Wikipedia,an AttLSTM automatic classification method is proposed in this thesis,which is end-to-end and has no feature engineering method.In the contrast experiment of text quality 6 classification,the accuracy is increased from 69% to 71%;at the same time,the data classification is divided into three categories.Through the comparison experiment,it is concluded that the attention mechanism can replace some manual features.3.Expand Chinese Wikipedia dataset and propose a method suitable for Chinese Wikipedia text quality automatic classification,MulCNN-LSTM.4.Collect a large number of master's thesis papers,and divide them into three quality levels of excellent,normal,and postponed according to the published data.For this long text of graduation thesis,this article designs a chapter-based full-text text quality analysis method.ChapterLSTM has verified the reliability and effectiveness of its model through a large number of experiments.The F1 value of this model is 90%,which is 15% higher than the best model currently available.5.Finally,a graduate thesis quality evaluation system is designed based on ChapterLSTM.This system is developed using the current mainstream front-end separation method.
Keywords/Search Tags:text quality classification, deep neural network, long text, graduation thesis quality
PDF Full Text Request
Related items