Font Size: a A A

Research On Document Retrieval Based On Image Content

Posted on:2011-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2178360305966221Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of global information technology, the quantity of documents has increased sharply and the query demands have gradually diversified, which makes, the traditional document retrieval systems have been unable to meet the needs of users, so the researching on document retrieval has an important significance and a wide range of applications. The research topic of this dissertation is providing a new and have effective retrieval method in the case of the unknown priori knowledge for users. A method of documents retrieval by the image content was researched in this dissertation, and the image retrieval techniques was applied the to document retrieval system.Firstly, the picture in a document was segmented according to the differences of run-length entropy between text and picture. The run-length entropy of each line was calculated in the document firstly, then the areas correspond to high value of run-length entropy were extracted as picture in the document.Secondly, we selected the five key bit-planes and calculated the histogram which was chosen as the first feature, then calculated the five-bit-plane local color density and the entropy which was the second feature, finally for the third calculated the overall smoothness which was the third feature in the five-bit flats. Combination of three kinds of features was used to retrieve the images.Finally, the source of extracted picture was recorded in associated database, and three kinds of features were calculated to build a feature library, then the results of the retrieved documents was outputted through the source of the pictures acquired by the image retrieval techniques.The experimental results show that the proposed picture segmenting method and image retrieval method are effective, and a accuracy and stability are very impressive. In addition, the bit-planes as the main feature have the advantages of a strong anti-noise and a fast computing speed which shows that the method is very effective for document retrieval. Moreover, the work of this dissertation made some useful exploring in semantic feature of images.
Keywords/Search Tags:information retrieval, layout analysis, image retrieval, run-length entropy, bit-plane, histogram
PDF Full Text Request
Related items