Font Size: a A A

WebCM:A Research Of Search Engine-based Monitor System For Web Content

Posted on:2003-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:X Y MiaoFull Text:PDF
GTID:2168360062450130Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The World-Wide Web (www) is growing rapidly in recent ten years and becomes a Information Center which is a vast collection of large volume, heterogeneous and unstructured information. It has been an indispensable part in people's life.At the same time, the network security has been a central topic of the development of the network too. Various kinds of the network security systems come forth and they are used populiarly, such as intrusion detection systems, software firewall and monitor system of the E-mail. However, the development of monitor system of the content of web information, which is paid the most attention to, lags behind. Many reasons are addressed and the most important one is distribution and opening of the platform accordin to the network protocol of TCP/IP and the varieties of the content according to HTML. Furthermore, HTML could't provide enough support for machines to understand the sematic cue of the web pages.With the understanding of lacking advanced monitor systems for web document, people have to check the content of the web content by themselves. But the efficiency is so low that it couldn't meet the needs of the development of the network. We need a computer-aided monitor system for the web content.This paper considers the research of the "electronic policeman" of the network as the background and focuses on the three key problems.The first one is how to identify, represent and match the monitor pattern. In our research, machine learning is used to find the pattern. Two kinds of models are used to represent the model: keyword-based conceptual model and ontology-based conceptual model. The former provides a field-independent monitor model and the latter provides a field-specific one. We compute the conceptual degree between the pattern and the monitored document to give the result.The second one is how to fetch, organize, represent all required web documents. This paper describes a high-performce collector of the information. All documents are compressed and saved in the repository. We define a set of HTML Tag. When the docouments are in turn to proess, the content of the documents is saved according to the pair of the attribute-value.The third one is how to make an excellent architecture for the monitor system for web content. This paper describes a framework of prototype system named Web Content Monitor, short as WebCM, which provides an appropriate framework the image-based and voice-based monitor system in the future.
Keywords/Search Tags:Web Content Monitor, Search Engine, Web Document Analysis, Machine Learning, Pattern Reconsnition, Ontology
PDF Full Text Request
Related items