Research And Implementation Of Vertical Information Search Technology And Algorithm Based On Web

Posted on:2018-12-09

Degree:Master

Type:Thesis

Country:China

Candidate:F L Li

Full Text:PDF

GTID:2348330518465878

Subject:System theory

Abstract/Summary:

PDF Full Text Request

With the increasing development of computer hardware,the Internet also gets unprecedented advancement.Especially in this exploded data's era,tremendous information covers the whole world.Then big data and some new technologies that related to the computer come into being.In big data's era,the information retrieval system helps people find out certain data correctly.The definition of information retrieval system is that according to certain key words or tactics,users turn to related crawler technology to �crawler� some information from the internet,deal with and present the information they want by Chinese word segmentation,duplicated web pages technology,sequencing optimization technology.Among these information retrieval technologies.Baidu,360 in China and Google,Yahoo abroad are representative.Although both of them focus on the field of retrieval,they still have their own traits and become the necessary tool in people's daily life.Owing to the wide range of these information retrieval systems,there are some difficulties when it comes to some extensive information and specified domain.In order to overcome these difficulties,vertical information retrieval system is introduced.The definition of Vertical information retrieval system is that based on the information retrieval system of specified sphere.For example,it includes document vertical information retrieval system,tourism vertical information retrieval system,shopping vertical information retrieval system and other systems.This project mainly studies news vertical information retrieval system and bases on original technology to carry out some optimization.First,make add-on development on the Heritrix mode to improve the efficiency of the optimized crawler technology.Then,on the foundation of getting the website resources,transform the website mode into TXT mode by HTMLParser technology.And on the base of IK Analyzer word segmentation technology to optimize,segment the TXT contents and filter the Dirty Data of the TXT contents.Secondly,advance the TF-IDF weighting algorithm,omit the repeated parts in the websites.Finally,on the frame of struts + spring + Hibernate,take MySQL as storage database,take advantage of Page Rank algorithm to optimize Lucene's arranging algorithm,set up and search index to complete news vertical information retrieval system.

Keywords/Search Tags:

Vertical Search, Chinese Word Segmentation, Crawler, Lucene, HTMLParser

PDF Full Text Request

Related items

1	The Design And Implementation Of Vertical Search Engine Based On Lucene
2	Design And Implementation Of A Vertical Search System
3	Research And Design Products Lucene Search System Based On Parity
4	Research On Key Technology Of Vertical Search Engine
5	The Research And Design On Vertical Search Engine Based On Lucene
6	Research And Implementation Of The Vertical Search Engine On Lucene
7	Research And Implementation Of Subject-oriented Mobile Search Engine Based On Lucene
8	Design Of Vertical Search Engine For Academic Resources Of Computer Science
9	The Vertical Search Engine For Campus Design And Implementation,
10	Research On Vertical Search Engine Based On SSH And Lucene