Font Size: a A A

Design And Implementation Of Topic Information System Based On Web

Posted on:2012-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:X Q JiaFull Text:PDF
GTID:2248330395956624Subject:Software engineering
Abstract/Summary:PDF Full Text Request
How to search the information user needing quickly and accurately from Web hasbecome a serious problem. To address this issue, in the field of information, topic Webming has been generated. The basic idea can be summarized as: according to topicsuser defining, with topic crawler traversing the network, collecting the pages relationto the opic ones, then pages will be collected and intelligently analyzed, finally in afriendly way to meet retrieval requirements of a specific topic.Thesis analyzes the topic of Web mining research content and current researchproblems based on the study. It will focus on three issues as follows: First, A topiccrawler algorithm has been proposed, mainly work is to strengthen the ability ofantispam, and an increase of crawler is on the topics to determine the accuracy ofcorrelation; Second, through the topic crawler algorithm improved, the pages collectedhas been analyzed and filtered. In order to facilitate research, the text filter istransformed into text classification. Due to Vector Space Model ignoring the context ofthe text information,the feature selection algorithm based on community founding hasbeen proposed to compensate for the defect in the text structure information by VectorSpace Model. Experimental results show that the classification methods are effectiveand feasible in precision, and recall. Third,to achieve automatic acquisition of topicinformation, on the basis of the previous algorithm,a topic information collectionsystem model is given based on Web.
Keywords/Search Tags:Vector Space Model, Topical Crawlers, Community Discovery, Similarity Text Classification
PDF Full Text Request
Related items