Research And Implementation Of Vertical Search Engine On Book Subject

Posted on:2015-05-17

Degree:Master

Type:Thesis

Country:China

Candidate:J W You

Full Text:PDF

GTID:2298330452953251

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

WWW has been an important repository of information with the emergence of theinternet network. Internet users can get information of interest from this repositoryrely on the search services provided by search engines. Traditional general searchengines can meet the basic needs of users to search information, but due to the broadinformation coverage, the results returned to the users include plenty of informationthat users donâ€™t care. Users have to do further filter operations to choose those searchresults, these additional filter operations reduce the user experience. Vertical searchengines make up for this weakness, they narrow the information domain coveragecompared with generic search engines. Vertical search engines just index informationwithin a certain professional field or a subject field, and therefore, they can ensurethat the content retrieved by users is really what they want. In additional, verticalsearch engines will do some information integration processing to the clutterednetwork content. Vertical search engines can help users quickly identify the mostimportant information by directly showing users the structured data extracted fromcluttered network information.The basic concepts and classifications of search engines were introduced, andthen the working principle of search engine was analyzed. By compared the differencein working principle between general search engine system and vertical search enginesystem, those key technologies of vertical search engines such as theme web crawleralgorithm and page similarity were studied. The main work done in this paperincludes the following. According to the characteristic that hyperlinks of the samesubject are similar in url structure, the traditional Shark-search crawling algorithm hasbeen improved. While predicting the priority score of child links, structuralcharacteristic of links was considered. Vector Space Model was analyzed, the methodof secondary thematic evaluation was proposed to get more high-quality theme relatedWeb pages. According to the distribution characteristic of book metadata in a webpage, a semi-automatic metadata extraction algorithm was designed by using ananalytical tool named HTMLParser and a bookâ€“oriented vertical search engineprototype system was designed and implemented by using lucene a full-text indexingdevelopment package, and the default method of sorting search results in lucene wascustomized. Finally, the improved crawling algorithm in this article was analyzed byexperiment. The results show that this algorithm can run better in specified websitesbecause the similarity between links of the same subject is relatively obvious. Testedand compared to general search engine system, the search results by bookâ€“orientedvertical search engine prototype system were more accurate. Moreover the ordersequence of search results can be more reasonable by customizing the default sortmethod.

Keywords/Search Tags:

Vertical Search Engine, Shark-Search, Information Extraction, Lucene

PDF Full Text Request

Related items

1	The Research And Application Of Vertical Search Engine In Screening The Resume
2	Research And Implementation Of Healthy Vertical Search Engine Based On Improved Shark-Search Algorithm
3	Research And Implementation Of Vertical Search Engine Based On Lucene/Http Client
4	The Design And Implementation Of Lucene-Based Digital Product Vertical Search Engine
5	Research And Implementation Of Subject-oriented Vertical Search Engine On Basic Educational Resources
6	Design And Implementation Of Job Vertical Search Engine
7	The Research And Design On Vertical Search Engine Based On Lucene
8	Design And Implementation Of Vertical News Search Engine Based On Heritrix
9	The Design And Realization Of The Vertical Search Engine On The Basis Of Java
10	The Design And Implementation Of Lucene-based Auto Information Vertical Search Engine