Genetic Algorithm Based Model Parameter Selection And Its Application In Text Classification

Posted on:2020-06-05

Degree:Master

Type:Thesis

Country:China

Candidate:D Zhao

Full Text:PDF

GTID:2428330596984890

Subject:Engineering

Abstract/Summary:

With the continuous development of computer technology,information data increase significantly,and even grow exponentially.The difficulty of effectively utilizing these information data increases accordingly.At the same time,there are lots of useless and harmful information in these information data,which bringing greatly negative effects upon the procedure of information processing.Therefore,how to effectively utilize these information data becomes a research focus in the field of machine learning.Moreover,text is regarded as a common form of information data.How to efficiently classify the given text data is an important task among the procedure of information processing.To improve the speed and accuracy of text classification,this thesis adopts a text classification method combining genetic algorithm(GA)and support vector machine(SVM)for the classification efficiency and classification accuracy.The method regards the SVM parameters as a chromosome in the GA and performs binary coding.The classification accuracy rate of SVM is used as the fitness function of GA,and the fitness of each individual is evaluated.Furthermore,the optimal SVM parameters for text information data can be obtained by the GA operators,i.e.,selection,crossover,and mutation.Finally,SVM with the optimal parameters can be utilized to classify the given text data with the existing categories.In general,we can classify the new-coming text information data into the existing categories.However,the existing category cannot satisfy the massive new-coming text information contents,i.e.,the categories of new-coming text data often beyond the scope of the existing categories.Therefore,how to effectively judge whether the new-coming text information data can be classified into the existing categories,cluster the text information data which do not belong to the existing categories,and add the group categories rather than the existing categories.These issues possess certain practical significance.Towards the problem of new-coming text information data cannot be classified into the existing categories,this thesis proposes a progressive clustering method.First,GA is utilized to select the appropriate feature word combination to train an SVM for the text information with the existing categories.The testing text information are used to classify the text information data belong to the existing categories.Thereafter,clustering the text information do not belong to the existing categories.GA is utilized to optimize the number of clusters and choose the optimal cluster centers for the fuzzy clustering method(namely,FCM).Finally,using the performance measures indices,i.e.,Precision,Recall,and F-measure to evaluate the efficiency and classification accuracy(i.e.,Macro-average and Micro-average)of the results.The experimental results show that GA-SVM can effectively improve the classification performance,while GA-FCM can also achieve better classification results.

Keywords/Search Tags:

Genetic algorithm, Support vector machine, Text classification, Parameter selection

Related items

1	Research Of Parameter Selection For Support Vector Machine
2	Research On Chinese Text Classification System Based On Support Vector Machine
3	Research Of Parameter Optimization For Support Vector Machine
4	Research On Kernel Function And Parameter Selection In Support Vector Machine And Its Application
5	Research On Text Classification Method Based On Support Vector Machine
6	Research On Text Classification Based On Support Vector Machine
7	Research On Text Classification Based-on Support Vector Machine
8	Research On Text Classification System Based On Support Vector Machine
9	Research On Text Emotion Classification Based On Improved Feature Selection Method
10	The Study Of Chinese Text Classification Based On FOA-SVM