Font Size: a A A

Research On Theme Analysis, Index And Retrieval Of Website

Posted on:2007-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:K SunFull Text:PDF
GTID:2178360185486114Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the division of labor in society developping towards specialization, specialized information service that demand of"special, precise and in-depth"increases day by day. Now a large number of specialized information website has emerged on network and search engine technology also develops gradually from common to speiality. Thus the relevant technology that offers service of classifying and navigating in automatic websites become active focus in information processing in recent years.This thesis mainly does researches on the intellectual websites retrieval which is based on website theme indexing. The research goal is to search website according to its theme. The key technology related includes website topological structure analysis, webpage content analysis and subject analysis and index. The research is launched for offering service of automatic searching websites.According to the above-mentioned research purposes, this thesis firstly analyzed the system architecture of intellectual website's retrieval. Then in-depth analysed some key technologies including the webpage content extracting, the website topological structure analysis, the website subject analysis & indexing and the website retrieval. Especially, some novety solutions, such as the content extraction methology based on the space length of tags, the website tree construction methology based on the similarty of directory in URL, and the website concept methology basd on the structure of website, are proposed. Finally on the basis of the algorithms and theory, the intellectual websites retrieval system is built and the experiments proved that it can achieve better results.The research on website retrieval will be benefited from the solution of subject analysis, index and retrieval put forward in this thesis. It can avoid the conflict between multi-needs of user and the single-structure of categorization, while overcame some questions that may appear in the traditional information retrieval mode and can offer a series of particular themes tracing and searching service. Further research has very large research space and promising prospect, which can promote the traditional information retrieval task to more professional field, and drive the relevant and speciality seaching technology to the application.
Keywords/Search Tags:website retrieval, theme indexing, website structure, webpage content extracting
PDF Full Text Request
Related items