Font Size: a A A

Research On Key Techniques Of Structure-analysis Based Large Scale WWW Text Information Retrieval

Posted on:2002-06-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:G Z FengFull Text:PDF
GTID:1118360185495633Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Web is the richest information base in history, but to search for desiredinformation on the web is very difficult. Web IR presents extra challege than classicIR because of it's huge scale, heterogeneity, and being dynamic, consequently WebIR excites research interest in many areas. This dissertation researches into the areaof Web IR, and classifies the various researches in this area into 4 types: the classicIR genre, the Metadata genre, the DB genre, and the link analysis genre. Classic IRcontinues researches in IR field, while the other 3 genra based their research onwhat's different from classic IR-- structure in web. Work of this dissertationincludes the following aspect:it's proposed that common "universal" search engine which treat the Webas stuctureless document set is not a good solution considering the vasescale of the web, and the diversity of web information and user query. Wepropose to treat the web as a structural data object, and build searchengines of various coverage, granularity, and characteristics, whichcooperate to form a Web IR framework to provide low resourceconsuming and optimzied service.it's proposed that search engines locating at the root node of the Web IRframework, which function on the whole web, should index the wholescope, with focus on the most significant information and structure. Wepropose to replace pages with page groups of various topic as the basicunit of search engines and provide a "coarse granular" styled conseptualIR service.it's proposed to employ link analysis to discover relations among webpages by and cluster them into page groups, which are the ones of betterquality. Compared with pages, page groups have clear topics, are stable,and less in number. Besides, they match what users desire in IR: a groupof same topic pages.it's proposed to index the page groups thematically rather than in full text,...
Keywords/Search Tags:Web IR, structure analysis, intelligent classification of search results, page group, thematic indexing, clustering, intelligent IR
PDF Full Text Request
Related items