Font Size: a A A

Genetic Algorithm Based Model Parameter Selection And Its Application In Text Classification

Posted on:2020-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:D ZhaoFull Text:PDF
GTID:2428330596984890Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of computer technology,information data increase significantly,and even grow exponentially.The difficulty of effectively utilizing these information data increases accordingly.At the same time,there are lots of useless and harmful information in these information data,which bringing greatly negative effects upon the procedure of information processing.Therefore,how to effectively utilize these information data becomes a research focus in the field of machine learning.Moreover,text is regarded as a common form of information data.How to efficiently classify the given text data is an important task among the procedure of information processing.To improve the speed and accuracy of text classification,this thesis adopts a text classification method combining genetic algorithm(GA)and support vector machine(SVM)for the classification efficiency and classification accuracy.The method regards the SVM parameters as a chromosome in the GA and performs binary coding.The classification accuracy rate of SVM is used as the fitness function of GA,and the fitness of each individual is evaluated.Furthermore,the optimal SVM parameters for text information data can be obtained by the GA operators,i.e.,selection,crossover,and mutation.Finally,SVM with the optimal parameters can be utilized to classify the given text data with the existing categories.In general,we can classify the new-coming text information data into the existing categories.However,the existing category cannot satisfy the massive new-coming text information contents,i.e.,the categories of new-coming text data often beyond the scope of the existing categories.Therefore,how to effectively judge whether the new-coming text information data can be classified into the existing categories,cluster the text information data which do not belong to the existing categories,and add the group categories rather than the existing categories.These issues possess certain practical significance.Towards the problem of new-coming text information data cannot be classified into the existing categories,this thesis proposes a progressive clustering method.First,GA is utilized to select the appropriate feature word combination to train an SVM for the text information with the existing categories.The testing text information are used to classify the text information data belong to the existing categories.Thereafter,clustering the text information do not belong to the existing categories.GA is utilized to optimize the number of clusters and choose the optimal cluster centers for the fuzzy clustering method(namely,FCM).Finally,using the performance measures indices,i.e.,Precision,Recall,and F-measure to evaluate the efficiency and classification accuracy(i.e.,Macro-average and Micro-average)of the results.The experimental results show that GA-SVM can effectively improve the classification performance,while GA-FCM can also achieve better classification results.
Keywords/Search Tags:Genetic algorithm, Support vector machine, Text classification, Parameter selection
PDF Full Text Request
Related items