The Research And Implementation Of Chinese Short-text Representation And Classification

Posted on:2013-02-18

Degree:Master

Type:Thesis

Country:China

Candidate:J J Peng

Full Text:PDF

GTID:2248330371989956

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Text classification is to automatically classify a text in a given taxonomy, based on the content ofthe text. It is the basis and core of the text. The analysis of the domestic and foreign research on this topicshow that the classification of participles and short text has been become the two biggest problems. Inaddition, the research of the text representation model is also a heated topic of the text classification field.This paper focuses on these three issues, trying not confine to f the traditional vector space model, and usesthe model of text representation of the sentence package so that people can solve a series of problems suchas the ambiguity caused by particles and the difficulty in feature extraction in short text.The following are the main work of this paper:1. To improve and realize Chinese clauses algorithm. Eliminating stop words and discontinuationsentence and meanwhile combining high similar sentences. By setting the stop words list and thediscontinuation sentence form, the system compare the text which needs classification with the two formswhile dividing sentences, if some words in the text are the same with some words in the two forms, theywill be removed, otherwise, the clause will be preserved;and then scan the sentences whose stop wordshave been removed, calculating their morphological similarity, if they are same, those sentences would beconsidered of highly similar sentences, then the system merges them according to certain rules.2. To improve text similarity computing method. To divide the text into several fragments, andthen consider the contribution of each fragments to text recognition and text-category distinction, givingeach fragment of text in a different location weights so that when calculate the degree of text similarity, justfollow the method of weighting to weight the degree of similarity of the text. The improved calculationmethod takes into account the location of the sentence in the text to distinguish the text recognition and textcategories.3.To summarize the respective advantages and disadvantages of the text representation model andtypical text classification algorithm by studying large quantities of domestic and foreign literature,and todetermine the sentence package model which is text representation of text categorization system according to the specific needs of the paper (short Chinese text classification). kNN is text classification algorithm.4. To Program the function of each module of the package model-based sentence classificationsystem of the Chinese short text.

Keywords/Search Tags:

Chinese text classification, short-text representation, BoS model, classification algorithm

PDF Full Text Request

Related items

1	Research On Chinese Short Text Classification Based On Word Embedding
2	Research And Application Of Chinese Short Text Classification Algorithm Based On Deep Learning
3	Research On Chinese Short Text Classification Based On Hybrid Neural Network
4	Research On Chinese Short Text Representation And Classification
5	The Research And Implementation On Chinese Short Text Classification Technology
6	Classification Of News Short Text Based On Deep Learning
7	Research On Classification Method On Chinese Short Texts With Few Words Based On Feature Representation
8	Short Text Classification Based On Apriori Algorithm
9	Research On Short Text Classification Based On Topic Model
10	Research On Key Technologies Of Chinese Text Classification Based On Deep Learning