Font Size: a A A

Text Classification System Of Subjectvity Based On Voting Mechanism

Posted on:2016-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z J ZhouFull Text:PDF
GTID:2308330476453452Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In this day of information expansion, the network is filled with too much valuable information which isn’t used reasonably. Classification of massive texts in large batches has gradually become a very urgent need. More people are willing to express their views and comments via Internet. Analysis and mining to these subjective texts can identify the inherent subjectivity and emotion tendency which has important application value in many fields such as E-commerce, public opinion monitoring and so on. In this paper, we mainly focus on subjectivity classification of microblog.This paper designs a subjective and objective classification system based on voting mechanism. The system adopts the integrated study, integrating three methods including the method based on density of clues, the method based on N-POS and the method based on conditional random field. So the research content is focused on the optimization of the base classifiers and the integration of learning.This paper introduces the structure and classification process of the traditional text classification system, and then respectively introduces the text pretreatment technology, text representation, feature selection method, feature dimension reduction method, text features’ weight calculation method, text categorization algorithm and assessment methods on text classification results and so on. It provides the fundamental technical support for the follow-up text classification of subjectivity and objectivity.In the subjectivity classification module based on the density of clues, this paper puts forward the concept of subjective clues and carries out statistical validation of subjective clues. At the same time, this paper adopted the new clues’ weight calculation method. For the first time, it introduces the concept of the density of clues to measure the subjectivity of text. It compares the two different weight calculation methods for the classification effect.In the subjectivity classification module based on N-POS, this paper puts forward the concept of the standard N-POS subjective pattern library and the subjective and objective analysis algorithm based on subjective N-POS model. It explores the subjective threshold and selection of proportion for the classification effect.In the subjectivity classification module based on conditional random field, this paper puts forward the suitable algorithm to get the results of subjective and objective classification from sequence. By the thought of the incremental learning, it completes feature template. It explores different templates and subjective weights for the classification effect.In integration experiments, the results show that the introduction of integrated learning makes the overall subjectivity classification performance increased by 2%. It indicates the feasibility of integrated study in the field of text classification of subjectivity.
Keywords/Search Tags:Subjectivity classification, Integrated study, Clue, N-POS, conditional random field
PDF Full Text Request
Related items