Font Size: a A A

DECISION TREE APPROACH TO PATTERN RECOGNITION PROBLEMS IN A LARGE CHARACTER SET

Posted on:1985-02-25Degree:Ph.DType:Thesis
University:Concordia University (Canada)Candidate:WANG, QING RENFull Text:PDF
GTID:2478390017461330Subject:Computer Science
Abstract/Summary:
Decision tree is a fast classifier in pattern recognition, where a large number of classes can be treated and the decision making time can be minimized by a series of small local decisions. Based on the consideration of entropy reduction, the general tree classifier has been analyzed. Theoretical results show that its search time and error rate are both in the order O(H), and overlap in the order O(H.exp(H)), where H is Shannon's entropy measure of the given problem. The results further reveal that the main difficulties in tree implementation are error accumulation and serious memory requirement due to overlap. Some design principles have been drawn from these behaviors of the decision tree. With entropy reduction over overlap as the objective, a new clustering algorithm, called ISOETRP, has been developed. Overlap is treated in ISOETRP for the first time and practically solved by the overlap table itself. Experimental results show that this clustering algorithm is very powerful in the design of the tree classifier. Some works have also been done on feature analysis. The profile feature and SVD (Singular Value Decomposition) are compared. The phase feature has been proposed and analyzed for pattern recognition. In order to enhance the tree classifier, some branch-bound search algorithms with fuzzy membership function as heuristic evaluation have been developed to reduce the error rate. A new model of decision tree with global training has been proposed. The advantage of the new model over the conventional decision tree is that error accumulation has been suppressed considerably and a very low error rate can be obtained at high speed. Several tree classifiers have been implemented to recognize 500-3200 Chinese characters. The result of the 3200 character classifier is very encouraging: the recognition rate is 99.93%, the error rate only 0.025%, and the speed 861 samples per second when the program is written in Pascal and run on a CYBER-172 computer. These results confirm the theory developed and design principles given in this thesis, as well as the newly proposed decision tree model.
Keywords/Search Tags:Tree, Pattern recognition, Classifier, Error rate
Related items