Font Size: a A A

Web Access To Information Technology Research

Posted on:2005-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:D H WuFull Text:PDF
GTID:2208360125453807Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the spring up of www and the advent of information-exploding age, technology of aquiring web information become a very active subject in the world. How to exactly get interesting information from web is the most important problem.However.since the complexity of web.the relevant research is hard, it is helluva to include all areas, appearance of topic-specific search engine become one of the best solutions.In this paper, we pick out the search engine Citeseer which is believed the best topic-specific search engine to get along with our research,try to put forword a scheme in order to promote scientists to aquire interesting computer papers from Citeseer more convenient and more exactly.Contrbution of this paper includes:1. Collecting and analyzing of paper on CiteseerWhen processing information on the web, we need to download html pages to native computer.In this paper, we design a web crawler on Citeseer to collect html source code of every paper, and storage it in native database,then analysis this information on the display rule of Citeseer,storage the result in corresponding table.The above work is a preparation for the following reseach.2. Qulity evaluation of paper on content and link structureIn this paper,we choose content information and link structure to do our research,the work is based on result papers aquired from Citeseer. We try to find a good means to sort papers over again,in order to find interesting papers more exactly.In the means based on content ,we choose "context foused graph" to find sample texts,and bayes arithmetic as classification theory.in the means of link structure.we choose PageRank arithmetic to do our research .Experiment results show these two kind of methods can right evaluate papers from two different sides.3. A knowledge decision frame based on content and link structureSince the method based on content evaluate papers from subjective point of view,while the method based on link structure evaluate papers from objective point of view,in this paper we put forward a scheme,which combine above two methods,to evaluate papers. Concretely speaking, first we find relative papers based on content,shrinking the size of result papers via Citeseer,then we evaluate these papers based on link structure.bring about results in order of evaluation value .Results of Experiments show thismethod have determinate effect.
Keywords/Search Tags:Web Crawler, Citeseer, Quality Evaluation, Context Foused Graph, PageRank, Content, Lmk Structure, Bayes
PDF Full Text Request
Related items