Font Size: a A A

Research On Text Classification Based On Feature Selection And Its Application

Posted on:2018-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:D D WangFull Text:PDF
GTID:2348330512959244Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the continuous development of computer technology, network information data increases explosively. These information have enriched people's lives,but there are also a lot of useless or even harmful information, which brings difficulties and challenges in rational and efficient use of information. How to find useful information accurately for their own among too much data, and it has become to be further resolved problem in the field of information technology. The text classification technology provide an effective solution to the problem above. Traditional manual classification method based on expert knowledge spend a lot of manpower and time. It has been difficult to adapt to the growth of huge data in modern society. With the development of science and technology, text automatic classification have emerged.Feature selection method is an indispensable technology in the process of text categorization, and the selected features directly affect the classification effect of the classifier.In this paper, considering that the traditional chi-square statistic feature selection method can not fully the word frequency and feature distribution in the text, we propose an improved chi-square feature selection method based on intra-class information to further improve chi-square statistic method.Support vector machine classification method is one of the most typical machine learning method in text automatic classification, which is simple, efficient and has high classification accuracy, and has been widespread concern for many scholars. In this paper, we adopt SVM method to classify the text. In order to further improve the classification accuracy of SVM and solve the SVM parameter selection problem, we propose an improved artificial bee colony algorithm to optimize SVM for text classification. We improve employed bees and onlooker bees search strategy in basic artificial bee colony algorithm and it has enhanced the classification accuracy of SVM effectively.To broaden the application areas of text classification methods, we build an secondary biological information database of human as text classification corpus. The database mainly contains a variety of exon and intron gene sequence information of p53 cancer gene, which provides a good platform for further study cancer. At the same time we propose sequence alignment methods based on cellular neural networks in order to make sequence alignment of p53 cancer gene in the secondary database, which enhances the sequence similarity effectively and provides a theoretical basis for further research on cancer classification.
Keywords/Search Tags:Text Categorization, feature selection, Support Vector Machine, Secondary Database, p53 Cancer Gene Sequence
PDF Full Text Request
Related items