Font Size: a A A

Research On Web Data Mining Algorithmic

Posted on:2008-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:K QiuFull Text:PDF
GTID:2178360212495344Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Research of the text mining system that based on the text semantics has attained very good results. With the development of the Internet, information on the web has grown tremendously. Besides the text, there are many kinds of media on Internet such as image, video, audio etc, which have become increasingly important data on the web. This provides both challenges and opportunities for data mining.The image is one of the most important multimedia on Internet, and it's the most easily attainable multimedia from the Internet. And the features of image existed in image's every layer, the semantic feature of the image is the most important and the most effective, which is on the top layer. So in this paper, we will research the multimedia information mining by using the image top layer semantic and text semantic.At first, this paper, pay much attention on we-page information extraction, image semantic, text semantic and expression model. On the base of these, in this paper motions a multimedia information mining frame.The whole system consists of seven modules: page parser, the main content extraction, text/image information extraction, feature selection, fusion model and semantic condensation. The most important models of the system are the fusion model and the semantic condensation. These two models use the NLP such as splitting word, name entity and data mining theory.At second, this paper use association matrix fuse the text semantic and image semantic, and we did some works on the sentences similarity account to condensing the text information to express the image information more perfectly. Then realizes these function under windows 2000 in JAVA language, The experiment shows that our framework is effective.
Keywords/Search Tags:Data mining, Web content mining, Semantic, Fusion model, Heuristic rules, Nature Language process(NLP)
PDF Full Text Request
Related items