Font Size: a A A

Research On Distributed Text Classification Based On Genetic Algorithm And Feedback

Posted on:2015-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZhongFull Text:PDF
GTID:2298330467463056Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Text classification, as a significate field in natural language processing, is a key technology of processing and organizing massive text data. In the era of big data, however, the massiveness of data brings great challenge in aspects of time and accuracy of text classification. This paper focus on the issue of speed and preciseness in text classification combined with genetic algorithm, feedback and distributed computing.The main work of this paper can be listed as follows:First, this paper introduces key technologies and algorithms in text classification. And it particularly describes text preprocessing, including word segmtation, feature selection and feature presentation, as well as traditional text classification algorithms, and emphasise the genetic algorithm in machine learning including its application in text classification.Secondly, we propose a distributed model of text classification, based on genetic algorithm and feedback in cloud computing environment. This model improves the genetic algorithm by proposing parallel evolution based on population division, which increases the accuracy of feature selection. Furthermore, it enhances the ability of dynamic self-improving by relevance feedback to adapt the scarcity of training sample data. Moreover, we change the model into MapReduce paradigm because of the massiveness of text data and the parallelism of the algorithm.At last, we use the open source cloud computing framework, Hadoop, to implement the text classification system above. And experiment presents that the model proposed by this paper have good effect.
Keywords/Search Tags:text-classification, feedback, genetic algorithm, hadoop
PDF Full Text Request
Related items