Study On Topic-Specific Web Information Collection And Analysis Technology

Posted on:2007-05-15

Degree:Master

Type:Thesis

Country:China

Candidate:Z Tang

Full Text:PDF

GTID:2178360185474494

Subject:Computer software and theory

Abstract/Summary:

Currently, search engine has become people'main access to gather information on the web. Traditional generic search engine use a program named Crawler to collect information from the whole Web, it has some disadvantages such as non-specific information collection, high rates of pages missing, and can not meet the needs of specific professional groups. What we need is a focused search engine, well classified, containing profound and entire data, and updating in time.We designed a focused search engine, and studied the topic-driven crawler's Web information collection and analysis technology; In accordance with the different methodology used to assess the value of links, we classified the search strategy, analyzed and compared characteristics, advantages and disadvantages of various search strategies. Also we analyzed several common Web community structure, and point out that the existing topic-driven Web information collection techniques that based on partial information had some problems: the contradictions between "partial optimistic" and "topic drift" on technical level, and"Recall"rate and"Precission"rate of the results. Therefore, we supposed to use Genetic Algorithm, which is highly interoperable, adaptable, Global, and based on probability of selection, to solve these issues. Mainly work is about:â‘ According to the differences of destination and methodologies between traditional generic search engines and focused search engine, we designed a focused search engine,introduced the function of each part of the search engine.â‘¡Studied the technologies about information collection, analysis and information retrieval, mainly about the topic-specific Web information collection and analysis technologies. Through comparison and analysis, we found out the existing technologies'advantages and disadvantages.â‘¢Studied the genetic algorithm's concepts, characteristics, methods and its mathematical mechanisms, supposed to use it in the topic-driven Web information collection area to improve information collection system's performance.â‘£By analyzing the difference and similarity between genetic algorithm and Web information collection technologies, we discussed the feasibility and some noteworthy issues when using genetic algorithm in Web information collection system. We designed...

Keywords/Search Tags:

Focused search engine, information collection, crawling strategy, search strategy, Genetic Algorithm

Related items

1	Research And Implementation Of The Strategy-Extensible Search Engine
2	Design And Implementation Of A Focused Search Engine
3	Research On Crawling Strategy Of Multi-Agent For Focused Search Engine Technology
4	Study On Focused Crawling Technique For Vertical Search Engine
5	Research And Design On The Search Strategy Of Focused Crawler Based On Genetic Algorithm
6	Design And Implementation Of A Focused Search Engine Template Based On Lucene
7	Research On Search Strategy And Algorithm Of Network Search Engine
8	Research Of Focused Crawling Strategy
9	Spider Crawling On Mobile Search Research And Implementation Strategy
10	Research And Application Of Vertical Search Engine Key Technologies Based On The Lucene