A Research On Multiple Text Representation For Text Classification

Posted on:2019-07-19

Degree:Master

Type:Thesis

Country:China

Candidate:N Q Li

Full Text:PDF

GTID:2428330545485304

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Text classification is a fundamental task in natural language prpcessing.Text representation is a key point of text classification.The performance of text classification largely depends on the quality of text representation.Text is composed of words or characters to form phrases,sentences,paragraphs,sections,chapters,articles,etc.Machine learning algorithm,neural network and deep learning require a real-valued vector or matrix as the input.Raw documents can not be processed by the algorithms directly.Text representation transforms raw documents to vectors or matrices that can be computed by computers.The core of the text representation is to truly reflect the content of the text while maintaining the differentiation of different texts.There are multiple contextual features in text data.Traditional text representation algorithm generates only one representation of the data to represente multiple contextual features.This will weaken the representation of these contextual features.In this paper,we introduce a new way of text representation,called multiple text representation.Multiple text representation generate feature represetation for each of the contextual structures in text data to enhance representation the text.In this paper,we will introduce three ways to generate multiple text representations:1.Alter k-Means model.Alter k-Means model can generate multiple clustering of text data.Each clustering has a set of representative vectors.These representative vectors project the original data into a new feature space.Multiple clusterings mean multiple sets of representative vectors and multiple features spaces.By projecting data into multiple feature spaces,we can enhance feature extraction of data.2.Alter LDA model.Alter LDA model can find multiple topic structures of the text data and generate multiple text representations with respect to these topic structures.Alter LDA use "Topic-Word" distribution to be the topic structures of text and "Document-Topic" distribution to be the features of each document.3.Horizontal multiple text representation use different text representation algorithm to generate multiple text representation.Each algorithm finds a different contextual feature of the textExpriments show that with multiple text representation,we can improve the performance of text classification while reducing the dimension of features.

Keywords/Search Tags:

Text Representation, Multiple Clustering, Multiple Feature, Text Classification

PDF Full Text Request

Related items

1	Research And Implementation Of Text Comprehensive Processing Platform
2	The Research On Local Smooth Preserving Of Manifold Regularization Auto Encoder For Text Representation
3	Text Representation And Algorithms For Chinese Text Classification
4	Research And Application Of Feature Selection And Text Representation In Text Clustering
5	Research On Improvement Of Chi-square Feature Selection And Word Vector Text Representation For News Classification
6	Algorithm Research On Text Classification And Named Entity Recognition Based On Deep Text Feature Representation
7	Chinese Text Classification Algorithm Based On Multiple-factors Feature Weighting
8	A Research On Text Analysis And Representation Based On Semantic Infomation
9	Research On Text Representation Model And Deep Learning Algorithm In Text Classification
10	A Research On Feature Extraction Applied For Text Classification