Font Size: a A A

Research And Implementation Of Web-based Full-Text Information Retrieval System

Posted on:2007-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:X W ZhangFull Text:PDF
GTID:2178360185478207Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet and enrichment of Web resources, obtaining information by Web-based full-text retrieval system becomes an important part of people's daily life. And users pay more attention to searching information more exactly and more efficiently.This thesis introduces the related theory and technique of Web IR and system implementation, and puts IR into practice of Web-based full-text IR application.This thesis brings forward and implements a DVCPS algorithm of vision- segmentation for Chinese content Web page based on DOM aiming for enhancing the pertinence and focalization of Web IR. It analyses the HTML page's structure and user's vision characteristic, builds the DOM tree under rule set and strategy, and parses the different semantic blocks of page. For boosting the efficiency of Web IR and reinforcing the interaction between system and user, this thesis improves the completeness and time complexity under some situation of the Lingo algorithm, which is used for Web page on-line clustering. Lingo is based on LSA theory and gets latent concept used for clustering description from search summarization. It constructs the related matrix of term, phrase and summarization, and makes singular value decomposition and similarity computation. This thesis implements Web page on-line clustering.This thesis designs and implements a Web-based full-text information retrieval system. This thesis introduces the system framework and main modules' technique. The system improves the Lucene's score mode for Web-based full-text IR. This thesis compares the search precision between word-segmented index based on ICTCLAS and single Chinese character index. The system applies the DVCPS algorithm and the...
Keywords/Search Tags:Web Information Retrieval, Web Page Segmentation, DOM, On-line Clustering
PDF Full Text Request
Related items