Font Size: a A A

Research And Implementation On Distributed Web Mining And Searching

Posted on:2007-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y YaoFull Text:PDF
GTID:2178360182977858Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, enormous Web resources become available for people to acquire knowledge and information. However, the rapid expansion of web resources and the disperse distribution of Web information prevent users from obtaining the information they need in a fast way. In consequence, how to classify and index the web information fast and correctly for users'quick access become an important research subject.Based on web data-mining and text clustering index, the dissertation focuses on technologies of popular web data- mining and searching engine by implementing a system named SmartFILTER-3 which integrated distributing web data-mining, indexing and searching .The discussed technologies are as follows.Architecture of distributed network nodes: This component is used for distribution deployment and connection of service nodes under network circumstances. The dissertation discusses and proposes a"Distributed Autonomy Domain"design based on multi-interface.Segment algorithm of language filtering participle and Chinese characters: Considering the difference between Latin and Chinese, the dissertation researches on their own words segment algorithms. It especially focuses on specialty of Chinese and presents a segment algorithm based on"meta-word"in dictionary. Storage for mass index: Capability of storing mass data index is critical oriented to fast indexing and searching in Web. The dissertation discusses and implements fast storing models based on BerkeleyDB.Distributing web information mining: As for web information acquisition, the dissertation designs and implements a web information mining system based on dynamic script control.Fast index searching based on full-text: Relying on the indexed documents information, index searching based on the whole document provides a rapid searching of any information in the document for users. The dissertation designs and implements a prototype system.
Keywords/Search Tags:Data-mining, Web-mining, Distributed-network, Search-engine
PDF Full Text Request
Related items