Font Size: a A A

Information Retrieval Using Categorization Structures

Posted on:2011-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:S L XuFull Text:PDF
GTID:2178360308952425Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Although document categorization structures have been used as a kind ofknowledge organization method in various places, the exploring of this kind ofhigh-semantic-value data for data mining has not been performed extensively.This paper is concerned with the speci?c problem of exploring the use of docu-ment categorization structures for information retrieval (IR). More speci?cally,two kinds of document categorization structures are studied, the newly emerging?at document categorization structure and the classical top-down hierarchicaldocument categorization structure.Flat Document Categorization Structure. The social bookmarkingand tagging systems in Web 2.0 provide almost all Web users a platform forcategorizing the Web resources in a flat structure. Since the categorization isconducted by ordinary Web users, we refer to it as"Social Flat Categorization".In this paper, we focus ourselves only on this specific kind of flat categorizationstructure. Considering that these systems bring together three entities, i.e. Webusers, social tags and Web documents, we implement two IR tasks based on them.1) A personalized search framework, utilizing social ?at categorization for bothdocument topic representation and user interest representation. Three propertiesof social bookmarking and tagging, namely the categorization property, keywordproperty, and structure property, are explored. 2) A general IR model, takingsocial ?at document categorization structures as complement of document con-tents. A new language model for IR (LMIR) is designed for this aim. A seriesof experiments are conducted for the evaluation of the two models. Extensiveexperimental results show that our search models can significantly improve thesearch quality.Hierarchical Document Categorization Structure. We consider theIR task where the data corpora have hierarchical knowledge-organizational struc-tures, e.g, the Web pages in ODP classification hierarchy. These hierarchies are created according to the prior knowledge of human beings, therefore they are se-mantically high quality. The inclusion of them in IR could be very bene?cial forIR performance. We propose a nonparametric hierarchical LMIR (NPH-LMIR)to incorporate such information into IR processes. A series of extensive experi-ments on the ODP data set show the e?ectiveness of NPH-LMIR.
Keywords/Search Tags:Document Categorization Structure, Social Annotation FlatCategorization, Hierarchical Categorization Structure, Information Retrieval, Per-sonalized Search, Statistical Language Model for Information Retrieval, Non-parametric Bayes
PDF Full Text Request
Related items