Font Size: a A A

Research On Active Learning Method Based On Density Clustering And Its Application

Posted on:2022-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z X LiuFull Text:PDF
GTID:2518306575466574Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The application of machine learning is very extensive,traditional supervised machine learning can fully learn from labeled samples.In practical application,a large amount of data lacks labels and is difficult to be used directly.It takes a lot of time and high labor cost to label all the data manually.The purpose of active learning is to solve the problem with a large number of unlabeled data,and select as valuable samples as possible for labeling through certain evaluation criterion,so as to form an effective labeled sample set and achieve better classification effect of the model.Density peaks clustering algorithm is a clustering method based on density and it can find clusters of any shape,which is helpful for active learning of sample labeling.This thesis combines active learning and density peaks clustering algorithm to carry out the following two parts of research work:1.An active learning method based on density clustering and neighborhood(DCN?AL)is proposed.Firstly,the density peaks clustering algorithm is used to preliminarily cluster the data set.Secondly,the sample query strategy is formulated based on the neighborhood information of the sample,and the selected samples are labeled,and then added to the labeled data set and used to modify the clustering results,so as to make the classification of clusters more accurate.Finally,when the number of labeled samples reaches the specified limit,the clustering process is stopped.Through several groups of comparative experiments,it is proved that after using the neighborhood information of the sample to select the sample for labeling,the clustering result can be effectively corrected.2.An active learning method based on density clustering for legal texts(DLL?ACTIVE)is proposed,which can effectively mark the legal text actively.Firstly,a dictionary of legal keywords is constructed,and important sentences are extracted from the long text according to the keywords to represent the original text.Secondly,the vector representation of text is obtained by Bert model.Then,the iterative process of density peaks clustering and active selection of samples for labeling was carried out.By combining the probability distribution obtained from the topic model,the classification probability predicted by the logistic regression model and the silhouette coefficient,the samples with unclear categories and possibly within the clustering boundary were selected for labeling.Finally,the process of sample selection and active learning is stopped after the stop condition of active learning is met,and the clustering results are returned.The comparative experiment on legal text shows that the proposed method is better than the method in chapter 3,and can accurately classify the legal text.
Keywords/Search Tags:active learning, density clustering, neighborhood systems, sample selection strategy, text classification
PDF Full Text Request
Related items