Font Size: a A A

Text Image Classification Method Based On Decision Tree Algorithm Is Studied

Posted on:2013-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:X M HuangFull Text:PDF
GTID:2248330377953580Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the rapid development of modern computer network technology and image access technology, the Internet image data increased sharply. There is a phenomenon that the image data is quite rich but the useful image information is poor. To solve the problem, a technology is needed to mine the potential useful information in the image. Thus, the technique of image mining has emerged. Image mining technology is designed to extract much potential information in image and play a role of image information decision-making. Nowadays, though image mining has become a hotspot, it is still a young technology needs to be further studied. Text image is a kind of special image that contains text information. It plays an important part in everyday life and has become the research object of data mining.To clearly understand the text information in text image, it is necessary to classify the text image. In this paper, the image mining classification algorithm was introduced at first. Then the studies on decision tree classification algorithm and the image feature were conducted. Finally, it focuses on the text image classification. The main research contents and results are as follows:(1) An improved ID3algorithm based on a new attributes selection criterion was proposed. The traditional ID3algorithm uses the information gain as attributes selection criterion, but this criterion has two deficiencies, one is the long algorithm execution time and the other is the attribute selection criterion tends to value of multiple attributes. To solve the two deficiencies, a simplified method based on equivalent infinitesimal and the attribute importance method that changed by the importance degree of attributes were used to improve the ID3algorithm. Finally, it makes a new attributes selection criterion.(2) An in-depth study on the three kinds of text image features was conducted. The text image features was obtained from image histogram and gray level co-occurrence matrix. The feature vectors used to distinguish text images were obtained by comparing text image characteristics including mean, variance, skewness, contrast, homogeneity, energy and correlation.(3) A text image classification method based on the combination of underlying image feature was proposed in this paper. The method uses the text image feature vector that obtain form the text image feature extraction process to classify text images with ID3algorithm, C4.5algorithm and improved ID3algorithm. The classification method was verified by experiments and the result shows that it owns a high performance.In this paper, improved ID3algorithm of decision tree for aiming at the defects, the decision tree classification technique was applied to the text image classification. The text image was classified with decision tree classification technique according to image feature extracted from text image. It works well on three kinds of text image and advantageous the user to quickly and accurately search text image which has high practical value.
Keywords/Search Tags:image mining, image classification, decision tree, text image, ID3improvedalgorithm
PDF Full Text Request
Related items