Font Size: a A A

Book Directory Reconstruction Based On Spectral Clustering

Posted on:2011-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ZhangFull Text:PDF
GTID:2178360302474582Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The 21st century is the century of information technology. Advances in information technology have accelerated social development. Meanwhile, they also bring unprecedented opportunities and challenges to people with disabilities. Digital library is an important way of accessing the information for persons with disabilities. Structured directory can greatly speed up the reading of books for the people with disabilities. This paper presents a spectral clustering based algorithm to reconstruct the directory.The reconstruction of book directory is based on the information extraction technologies. Text analysis, index item modeling and directory tree rebuilding are the three major tasks. In the text analysis, we have to replace the un-regular characters and clear the full text. According to the structure and storage of the directory, we design and apply a word segmentation algorithm on the text to segment the words and label the features. To convert the text into digital form, we apply a feature-based augmented vector model on text of the directory. Then do the cluster analysis using a spectral clustering algorithm based on normalized cut criteria. Finally, reconstruct structured directory with a tree rebuilding algorithm. This algorithm adopts a depth first strategy using the sequential information and the cluster information.This algorithm is implemented and tested in the China Digital Library for the Visual Impairment system. It parsed 702 books and rebuilt the directory for them. It shows a good accuracy and has greatly reduced the manual workload.
Keywords/Search Tags:Directory reconstruction, information extraction, the China Digital Library for the Visual Impairment, spectral clustering
PDF Full Text Request
Related items