Font Size: a A A

Research On Document Classification Method Based On Graph Modal

Posted on:2011-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2178330338978780Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
At present, some comparatively mature text classification algorithm has been applied to text classification, but most of it is based on vector space model (Vector Space Model). Vector space model (Vector Space Model) turns every article into a high dimensional vector space vector calculation, each component represents a term weights, that is, to transform the procession of every article into the calculation of vector. This method reduces the computing complexity of document processing, and increased processing speed. But the vector space model takes the document as a collection of words, and assumes each word is independent, so that losses a lot of text structure information. While in natural language, it is often interrelated between word and word, therefore, the link between the contexts of the article is also very important. In order to solve this problem, some scholars proposed graph- model -based text representation.In order to achieve the document classification in graph model, this paper pretreats the corpus selected, analyses present feature selection algorithm, chooses the method of extraction test to make feature selection, improves the method of weight calculation, finds a method of weight calculation to classify text in graph model, establishes the graph model of the text according to its definition, finds a method of calculating similarity coefficients to classify the documents, and completes the whole process of graph model document categorization.From the aspects of some important component of text classification like the pretreatment training, feature selection, establishment of graph model, and calculation of graph model similarity coefficients, etc., this paper designs its own algorithm, suggests a method of weight calculation in graph model, establishes undirected and weighted graph, and implements these algorithms; it makes experiment by selecting 3 categories of Sougou Corpus, c8 Economics, c10 IT, c13 Health, analyses the results of evaluating test classification algorithm like Precision rate, Recall rate and F1, and concludes that the graph model text classification algorithm is an effective document classification algorithm.
Keywords/Search Tags:weight, document graph, document classification
PDF Full Text Request
Related items