The Research On Graph Structure Representation Method Based Chinese Text Clustering

Posted on:2010-04-22

Degree:Master

Type:Thesis

Country:China

Candidate:Q F Liu

Full Text:PDF

GTID:2178360275457860

Subject:Systems Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development and popularity of information technology,an increasing number of electronic texts come forth,people have experienced from an information resource lack time to an information abundance one.Facing to massive information resource,people can hardly find quickly and effectively the information needed.Therefore,how to organize and manage document information rationally and effectively has become a very important research task in information processing field.In recent years,as the prerequisite to ensure the quality of text mining methods,the text representation method research has attracted more and more scholars.Starting with the text representation method,our research applies graph theory technology to text mining,and then put forward a new graph-structure-based text representation method.Comparing with the traditional vector-based text representation,graph structure is propitious to representation of the text structure information.While retaining the characteristics of the text,it also can describe the information of the relationship such as location and strength of association between terms.The research mainly includes the following sections.First a textual representation model is proposed.The graph structure based Chinese textual document representation model is proposed on basis of the analysis of traditional text representation model.Afterward the text is represented as a graph whose nodes are the selected terms and edges are the corresponding relationships respectively.Therefore more semantic and ordering information among terms as well as the structural information of the text are stored.Followed by a similarity measuring algorithm is introduced.The similarity measuring algorithm used for text classification is accordingly proposed based on the semantic graph structure model by measuring the maximum common subgraph between each pair of semantic graphs.The mcs similarity measuring algorithm considers not only the content similarity but also the structure similarity of text which is more comprehensive.Assume that the more common part the two graphs have,the similarity between them is larger,therefore utilize the characteristics of the mcs to measure the similarity of graphsThen an improved clustering algorithm is proposed.We used an improved K-means algorithm to clustering,a concept of median graph is introduced to measure the distance between single graph structure and the graph set,which enhance the clustering algorithm for graph structure based textual document clustering.Finally is experimental verification.We use the test data with category tag to clustering and three indexes precision,recall and F-Score are introduced to evaluate the effect of the clustering results.

Keywords/Search Tags:

Graph Structure, Test Representation, Test Similarity Computing, Max-Common subgraph, Text Clustering

PDF Full Text Request

Related items

1	Research On Text Summarization Technology Based On Abstract Meaning Representation Graph
2	Study On Similarity-based Text Clustering Algorithm And Its Application
3	Research On Event-Oriented Text Representation And Applications
4	Research On Frequent Subgraph-based Graph Query Techniques
5	Semantic Embedding Representation Model Of Multimodal Test Questions For Test Question Duplication Detection
6	Research On Subgraph Isomorphism Constraint Solving Technology Based On Graph Representation Learning
7	Text Similarity Computing Theory And Applied Research
8	The Key Technologies Of Software Parallel Test Based On Cloud Computing
9	The Research And Application Of Spectral Clustering Algorithm Based On Neighbor Similarity Graph
10	Research On Short Text Classification Method Based On Text Graph Structure