Font Size: a A A

Research And System Realization On Focused Web Searching And Mining

Posted on:2008-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiuFull Text:PDF
GTID:2178360212974600Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The content of this paper is the research and system realization on focused Web searching and mining. Through the independent developed prototype system, Infox Studio 2, this paper mainly discusses some popular Web data mining technology as well as the search engine core technology, the summary is as follows. The technology of focused web crawler: this paper proposes a web crawler search strategy based on non-greedy genetic algorithm, carries on the data analysis and the performance comparison to each algorithm, and has determined their use scene. Web data localization and renewal technology: this paper constructs a high-speed storage model on enormous data with the aid of Berkeley DB, and saves the Web data that are captured by the crawler to local memory; considering the different renewal frequency of Web resources, this paper proposes a renewal technology based on the classified information. Chinese word segmentation: considering the characteristic of Chinese data, this paper adopts the algorithm based on the"meta-word".This paper also describes the design and realization details of the prototype system, Infox Studio 2. The performance of this system's main modules has been carried on various analysis and comparison in different network environments. The experiments prove that, this prototype system has basically met the design requirements.
Keywords/Search Tags:data mining, web mining, focused, search engine, web crawler
PDF Full Text Request
Related items