Font Size: a A A

Design And Implementation Of Text Classification System Based On Active Learning

Posted on:2020-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:J H NieFull Text:PDF
GTID:2428330572973593Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the information age,the Internet has developed rapidly,text data has begun to explode,and rich information has enormous value behind it.Text categorization technology can effectively arrange and organize texts,improve the efficiency of information retrieval,and mining deep information about text.With the development of data collection and storage technology,collecting massive text data is no longer a problem.However,the current practical application of text categorization technology is still limited to large companies or research institutions,because traditional supervised learning-based text categorization requires a large number of labeled samples,and the labor cost of labeling large amounts of text data is too high,the method of randomly selecting partial marker data is not only a waste of data resources,but also affects the final classification accuracy rate.Therefore,constructing a text classification system that can effectively utilize unlabeled data sets has important practical application value.In order to solve the above problems,this thesis takes active learning as the starting point,designs and implements a text classification system based on active learning.The main work completed in this thesis includes:(1)Based on the active learning algorithm and using the RCNN model as a classifier,this thesis proposes a text classification framework based on active learning.At the same time,combined with data mining technology and deep document vector model,this thesis improves the initial sample selection algorithm,and samples selected from unlabeled sample sets can better represent sample space.The comparison experiment proves that the improved initial sample selection algorithm can significantly improve the classification.The discriminating ability of the unlabeled sample improves the efficiency of the active learning algorithm.The final experimental results show that compared with other active learning algorithms in the current research,the framework proposed in this thesis can obtain higher text classification accuracy with lower labeling cost.(2)Based on the proposed text classification framework based on active learning,designed and implemented a text classification system based on active learning with scalable,high performance and interactive design.The test results of the system function show that the system can effectively reduce the manual labeling cost required for the user to complete the classification and prediction task,and can effectively solve the problem that the system target user has difficulty using the unlabeled text data set.
Keywords/Search Tags:Text Classification, Active Learning, Deep Learning, K-means
PDF Full Text Request
Related items