Font Size: a A A

Research And Implement Of Web Information Intelligence Collection And Classification

Posted on:2015-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:F LiuFull Text:PDF
GTID:2298330452994257Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology, we have entered the digitalinformation age. The Internet as the world’s largest repository of information has becomethe main means of access to information. As information resources on the network has amassive, dynamic, heterogeneous, semi-structured and other characteristics, and the lack ofa unified organization and management, so how to quickly and accurately from the mass ofinformation resources to find the information they need has become Internet users need tobe urgently addressed a major problem. Thus Web-based network information collectionand classification has become a hot research.The goal of traditional Web information collection is to gather information as much aspossible page, or even the whole resources on the Web, in this process, it is not tooconcerned about the acquisition order and related topics have been collected page. Thepage content is too messy, greatly consume system resources and cyber source. Thisrequires effective collection method used to reduce the occurrence of webpage collectiondisorderly and the repetition condition. How to solve the information be in a largeextent, and is convenient for the user to accurately locate the information they need, relyingon the artificial way to classification is unrealistic. Therefore, automatic webpageclassification is an effective means of organizing and managing information. Which is alsoan important part of this paper.This paper first introduces the topic background, research significance and the currentResearch situation,describes the main techniques and algorithms related theory,design the webpage information intelligent acquisition and classification system, the systemincludes two parts: information collection and classification. Information acquisitionpart, mainly based on the web crawler breadth first strategy algorithm based on the themeof the webpage information extraction method and rule template, the free orsemi structured data into structured data. Information classification part, according to theneeds of users, the SVM algorithm combined with the use of word segmentation andfeature extraction technology to classify information, provide a full rangeof information services for users.
Keywords/Search Tags:Information collection, information extraction, information processing, SVM classification algorithm, information classification
PDF Full Text Request
Related items