Font Size: a A A

Long Document Classification And Crowdfunding Platform Project Screening And Recommendation Based On Deep Learning

Posted on:2022-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:W Q MaFull Text:PDF
GTID:2518306323966049Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The task of document classification in NLP requires models to extract high-level features from low-level word vectors.Generally,feature extraction of deep neural networks uses all words in documents,which is not well suited for longer documents.In addition,training deep neural networks requires considerable labeled data,which is tough under weak supervision.To meet these challenges,we propose an algorithm for weakly supervised long document classification.On the one hand,we propose to use a little seed information to generate pseudo-documents to deal with insufficient data.On the other hand,we propose to combine RNN with local attention learning mechanism to predict locations of vital document fragments to extract summary features,thereby improving speed and accuracy of subsequent category prediction model.DonorsChoose,a public welfare crowdfunding platform,is currently facing two issues of screening project proposals and recommending projects for donors,which also involve processing of long texts.This paper applies our algorithm to actual problems of DonorsChoose,using its realistic data set to construct a project classifier and recommendation system,providing reference solutions for such problems.Experimental results show that:(1)The pseudo-document generation algorithm can indeed enhance training data,and improvement in prediction accuracy is particularly significant under weak supervision;(2)The long document classification algorithm based on local attention mechanism is significantly higher than benchmark models on prediction accuracy,and has practical processing speed;(3)When dealing with actual problems of DonorsChoose,the whole or part of our algorithm can be used.When only text features are used,our model is also better than benchmark models.If adding non-text features to our model,better solutions can be achieved.Main contributions are to propose a long document classification algorithm for weak supervision.Applications in screening and recommending crowdfunding projects demonstrate practical application value of our algorithm.Our algorithm provides a more efficient and accurate method for long document classification and feature extraction,having a wide range of management application scenarios.
Keywords/Search Tags:recurrent neural network, document classification, pseudo-document, local attention learning mechanism, crowdfunding project
PDF Full Text Request
Related items