Font Size: a A A

Study On The Unlabeled Text Mining Methods Based On The Concept Lattice Extension Models

Posted on:2019-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:X M ZhuFull Text:PDF
GTID:2428330572951511Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Text mining,whose mean tools are text classification methods and text clustering methods,is a important technique for obtaining valuable information.With the rapid growth of the text information,the proportion of unlabeled text is higher and higher,however,the traditional classification methods cannot directly deal with this type of texts,one solution is to combine with clustering methods and classification methods so that the unsupervised classification problem is transformed into the supervised classification problem.The concrete solution is first to excavate the text categories to construct the classification system by the text clustering,and obtain the keywords related to the categories;Then,the labeled samples are obtained by the word matching method;Finally,the classifier is constructed.The common clustering methods are generally to divide the texts into the unique categories,however,the texts in the actual application are often multi-categories and there exist the hierarchical relationships between categories.On the other hand,two commonly used clustering methods,k-means and FCM,are sensitive to the setting of initial values and the corresponding clustering results are unstable.The text clustering methods based on concept lattice theory are stable and can contain multiple inheritance relationships of cluster nodes to construct multi-level classification system.In fact,fuzzy concept lattice is an extension of classical concept lattice,which can express the uncertainty relations between texts and features.In this paper,the text clustering is implemented based on fuzzy concept lattice,we give a improved calculation method of similarity between fuzzy formal concepts,in which the influences of fuzzy object subsets and fuzzy attribute subsets are considered.And then we construct concept hierarchy and generate fuzzy ontology,which is used to construct the classification system.In addition,we expand and improve the classification system by acquiring keywords related to the existing tags.Compared with support vector machines,neural networks and other commonly used classification methods,the classification methods based on the concept lattice,which use a set of positive classification rules to realize classification,have better interpretability.However,the formal concepts in concept lattice only consider the positive association relations between the feature attributes and the category attributes,and ignore the negative association relations between them.The concept lattice theory was extended to three-way concept analysis by Qi etc.in 2014,the extension theory can express positive and negative association relations between items at the same time.In this paper,we apply the three-way concept analysis to text classification for the first time.First of all,we give the definitions,acquisition method and principle of positive and negative classification rules;And then classification rule sets are grouped according to category and the weighted sum of each class is calculated,in which the negative classification rules have negative influence on weight,and the category that corresponds to the most powerful value is the prediction category.In the end,sogou laboratory news corpus are used to evaluate the text clustering and classification effect.For text clustering,we use average quality in the class to evaluate clustering effect,the experiments are designed in terms of data sets,the number of special words and parameters,and experimental results show that the improved method has higher average quality in the class.For text classification,we use accuracy to evaluate classification effect,the experiments are designed in terms of test sets,the numbers offeatures,and experimental results show that the classification model based on the three-way concepts has higher accuracy than the classification model based on the formal concept,with an average increase of 5.9%.
Keywords/Search Tags:text classification, text clustering, fuzzy concept lattice, three-way concept lattice, classification rules
PDF Full Text Request
Related items