Font Size: a A A

Research On Classification Of Massive Text Feature Under Distributed Architecture

Posted on:2015-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y M ZhaoFull Text:PDF
GTID:2298330467963269Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The Internet information is in the form of a massive explosion, in these huge amounts of data contains a large number of potential information. Most of the existing data analysis tools cannot analyze such massive text data, Therefore, we must find fast intelligent processing method based on machine learning theory.On the one hand, parallel computing and distributed storage technology are important ways to solve the bigdata problem. On the other hand, flexible intelligently study is also an important method through word segmentation, feature extraction, which based on information theory and statistical language model.This Paper analyzess the basic process of text classification, including pretreatment, text vectorization, feature extraction, classifier, and then details the widespread use of distributed architecture Hadoop and its storage solution. It mainly discusses the principle of support vector machine and bayesian classifier, then makes parallelization benchmarks. Besides that, this paper also discusses the text sentiment classifiers, and its optimization and performance test.The innovation points are including as follows:it closely integrates natural language processing, signal analysis and parallel computing and developed a general-purpose text classification method; it introduces some optimization methods that greatly improved efficiency of text sentiment classification through multicore and data structure optimization; make a contrast test between long text (news) and short text (weibo, etc) and pont out the eassy needs to be awared of short text processing.The intelligent classification in massive text, involving parallel computing and distributed storage, and a series of text processing technology, have great academic and realistic meanings under such social background.
Keywords/Search Tags:feature extraction, text classification, parallel acceleration, sentiment computing, social network analysis
PDF Full Text Request
Related items