A Research And Implementation Of Vertical Search Technology In Archives Domain

Posted on:2012-12-25

Degree:Master

Type:Thesis

Country:China

Candidate:Z H Wang

Full Text:PDF

GTID:2178330332485790

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Archives are very important files that every country and everyone is connected to them. The construction of archival information in our country is far behind the developed countries', though had some achievements. Research how to promote the construction process and utilization factor of archives is a critical topic of our nation.Search engine have gained people's favor for its advantage to serve real-time and extract information they want, and becoming into their major tools to gain needed information. But the vertical search engine come out and grow up rapidly because normal search engine which have a width coverage and inaccurate information, can not satisfy user's need. Differently from normal search, the vertical search faced to exact domain, so it can be more concerned, more professional and search deeper information in particular domain. Nevertheless, the vertical search engine is still not satisfactorily. Research and improve is very hot whole the world nowadays. Main content in this article is research features in archives domain, research and improve vertical search technologies based those features and used into archives domain.Initially, research and implement the topic crawler based archives domain characteristics to collect archives'information is the beginning. Archives are special files with many unique features, such as originality, normative storage format, reappear the history, unified administration, consistent identification, etc. And they are stored in specified storage websites, by which offer access to society or special audience. As a result, topic crawler in archives can be restricted in limited range and search documents for analysis. A domain faced linked analysis algorithm is advanced for those purpose. Strategy to use irrelevant files find relevant is given, too. Files collected by topic crawler need content analysis, meanwhile keywords with weights will be calculated and abstracts will be extracted. Improved TF-IDF (Term Frequency- Inverse Document Frequency) algorithm is used to calculate the weights of keywords based on the existence of instruction documents and which contain very important information such as keywords, owner and so on. The weights of keywords in such documents would be assigned to 1 when they exist, otherwise, different weights would be assigned according to the place that contains keywords, title, body, abstract and other. Besides, archives and relevant files are processed into structured files, xml files, using text analysis technology, so as to supply more accurate search results. Both static and dynamic abstracts are used in search process to provide more appropriate document summarize. If the archive contains abstract, it will be used as static abstract. If not, dynamic abstract will be combined from sentences which have keywords user input. Those sentences can be found quickly by using place information in index. After user's search, they can vote the result, and the vote will be used to optimize the system. Additionally, a vertical search engine in archives domain is designed and its'flow chart is given. Crawler algorithm and craw strategies, improved TF-IDF are implemented at the same time.Oppositely, a Best-First Search algorithm and the TF-IDF algorithm are implemented too. According to the research and experiments, use those improvements can get a better result. The topic crawler can collect more relevant files, and the indexer could calculate keywords'weights more precisely. Techniques this article suggested can become a reference for our nation's archives information construction and vertical search in this domain.

Keywords/Search Tags:

Archives, Vertical Search, Topic Crawler, Query Sort

PDF Full Text Request

Related items

1	Design And Implementation Of A Vertical Search In The Life Service Industry
2	Research And Implementation Of Intelligent Crawler System In Vertical Search Engine
3	Research On Personalized Vertical Search Technology Based On Multi Topic Aggregation
4	The Internet Public Document Search System Based On Vertical Search Technology
5	The Topical Web Crawler Research In Vertical Search Engine
6	Research And Realization On Focused Crawler Key Technologies Of Vertical Search Engine
7	Study On Algorithms Of The Vertical Search
8	The Design And Research Of Topic Web Crawler In Vertical Search Engine
9	Vertical Search Engine Based Public Opinion Alert And Analysis Platform
10	Research And Implementation Of Vertical Search Engine