Font Size: a A A

Research Of Webpage Hot Topic Retriving Technology Based On Data Stream Mining

Posted on:2007-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:C L ZhangFull Text:PDF
GTID:2178360185485620Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the web, the research of consensus encounters new problems and challenges. The web consensus provides a more comprehensive and more centralized reflection on the consensus. Because the web consensus can be anonymous, it may reflect most people's viewpoints about an event truly.In order to study the web consensus, we need to collect the information put out on the web, and then mine the information. The research of web consensus belongs to the field of web mining. Web mining aims at mining useful knowledge and based on data mining, text mining and multimedia. It integrates the computer network, database and data warehouse, artificial intelligence, information retrieve, visualization, natural language processing and other technologies. Web mining is a novel subject which is a combination of conventional data mining technology and web. According to the differences between mined objects, the conventional classification method classifies the web mining into three categories: web content mining, web structure mining and web usage mining. In this paper, another classification method is introduced and completed, which is from the view of application, so it is more proper to category the application. This method also classifies the web mining into three categories: producer-based mining, consumer-based mining and additional service provider-based mining. It's introduced in detail in this paper.By analyzing the webpage topic that are frequently accessed by users, we can know the events users are interested in during a period, then master the development of the consensus. To meet this need, we study the actual transmission process of webpages, summarize the characteristics of the transmission and design the process of the extraction of the webpages' URL and topic according to these characteristics.The topic stream extracted from the network flow can be considered as an indefinite data stream, and because of the limit of the memory, the algorithm used in the frequent item statistics should satisfy two requirements: finding thefrequent item by scanning the data only one pass; and low space and time costs.
Keywords/Search Tags:web mining, frequent item mining, HTTP protocol, data stream
PDF Full Text Request
Related items