Font Size: a A A

The Application Of Factor Space Theory In Text Mining

Posted on:2017-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z J LiFull Text:PDF
GTID:2358330485996626Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data, based on factor space of knowledge representation and its application are also aroused people's concern. At present, based on the theory of factors space, text mining application is a hot research direction, including text categorization and text clustering as text mining the hottest topic has been widely used, this paper studies text classification algorithm based on the theory of factors space and text of the construction and application of clustering algorithm, the main results are as follows:In this paper, first using the theory of factors space and its knowledge representation, respectively based text categorization algorithm based on factor space and text clustering algorithm based on factor space. For text classification algorithm based on the theory of factors space, since based on factor analysis, the text representation of the table, using the factor feature extraction algorithm based on genetic algorithm optimization for text feature selection, using factor analysis method for text classification rules of learning, finally USES the improved factor analysis, the reasoning model for text categorization, and classification of news corpus by sogou laboratory and Internet electric business platform review data for text classification experiment, in which this article text categorization algorithm accuracy reached 82.33%, and the commonly used text categorization algorithm accuracy is only 74.67%, 79.16%, the experimental results verify the effectiveness of this text categorization algorithm; For text clustering algorithm based on the theory of factors space, first of all, by the word vector model features will entry into vector, k-means clustering algorithm are used for text clustering characteristics, establish initial factor set, by adding human understanding and professional knowledge to further establish basic factors set, and then on the basis of the basic factors set established between text factors similarity and distance of the final text clustering using the hierarchical clustering algorithm for measuring clustering algorithm and data classification structure similarity Class_F value of text clustering algorithm of this paper is 0.701, compared with 0.308 common text clustering algorithms. The experimental results show that the proposed algorithm is effective, factor space theory for the study of the text mining provides a feasible new research idea.
Keywords/Search Tags:Factor Space, Text Mining, Factor Analysis, Feature Extraction, Text Representation
PDF Full Text Request
Related items