Font Size: a A A

Research On Methods Of Extracting Image Semantics In WWW

Posted on:2005-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2168360125462377Subject:Education Technology
Abstract/Summary:PDF Full Text Request
As to images being heavily increased in WWW, A method of extracting image semantics based on HTML documents was given in this paper which could automatically index and analyze images on semantic level. This method is very significant to content-based and semantic-based image retrieval in WWW.As external information sources and carriers of images in WWW, HTML documents contained plentiful text informations which related closely to image content and image semantics such as image name, image annotation, image surrounding texts, image URL, web page title and URL of image, hyperlink web page title and URL of image etc..In order to extract image semantics in WWW from the text informations said above, a semantic representation model of image was built by its semantic and visual attributes. At the same time, four image semantic dictionaries also were built based on the semantic representation model of image. These semantic dictionaries include image classification dictionary of topic words, image classification dictionary of mainbody words, image classificatioa dictionary of mainbody-attribute words .and image antitheses dictionary of topic words which was used to translate the Chinese Pin Yin,English words and abbreviative words into Chinese topic words.Based on the semantic dictionaryies and texts related to image in WWW, the process of extracting image semantics is made up of three steps. The first,use image antitheses dictionary of topic words to translate the Chinese Pin Yin, English wordd and abbreviative wordd into Chinese topic words. The second, segment the texts related to image in WWW into words and annotate their part of speech.The last,use regulation-based andstatistic-based approach to extract topic words of image,mainbody words of image and mainbody-attribute words form the segmentation and annotation result of the texts related to image in WWW.From the ideas said above, a semantic extracting system of image in WWW was developed in this paper. This system includes three parts. They are meta-search and pretreatment module, semantics extracting module and topic words online learning module. Meta-search and pretreatment module is responsible for searching images from Google or baidu search engine and extracting related texts from web pages related to image in WWW. Semantics extracting module is responsible for extracting topic words, mainbody words and mainbody-attribute words from related texts of image in WWW. Topic words online learning module is responsible for finding new topic words from related texts of image in WWW and adding them automatically to the image classification dictionary of topic words.At last, an experimental result of extracting image semantics in WWW was given in this paper. The average overlay rate was 52% and the average nicety rate was 44%.The experimental result showed that the extracting method of image semantics in WWW has highly applied value to content-based and semantic-based retrieval of image in WWW.
Keywords/Search Tags:Image Retrieval in WWW, Image Automaiteally index, Image Semantics, Image Classification, Meta Search, Chinese Words Automaitc Segmentation, Part-of-speech Annotation, Information Extracting
PDF Full Text Request
Related items