Font Size: a A A

Xml-based Web Content Mining Technology

Posted on:2009-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:X X LiuFull Text:PDF
GTID:2208360248952860Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Since Internet came into being, the information in it has increased so rapidly by exponent. How to find and get the useful info has already been a hotspot in Data Mining, then Web Data Mining emerged as the times require.Web is an information warehouse which is huge, avidly distributed, highly isomerous, hyper text and hyper media, while including abundant, dynamic hyperlink and visitorial, usage information to web pages. Because nowadays network environment is based on HTML which only describes forms rather than discovers content, so the problems of half or non-structure datum and isomerous database source bring difficulties for web data mining. XML (extensible Markup Language) which is developed by W3C supports abundant data structure, especially emphasizes the relationship between semantics and element, so web data mining based on XML can make full use of XML characteristic, and brings new chance.Firstly, the paper expatiates on the methods of web data mining based on XML. Through introduction on the technology of Web Data Mining and analyzes the advantage of XML in web data mining, meanwhile provides designing idea and builds the system model.Secondly, this dissertation designs experimental system of Web Content Mining based on XML (WCMS) , which possesses function of Web text preprocessing and Web text mining, its advantages are reducing amount of data step by step by fixing on authority pages, XML technique, feature selection in order to obtain term gather that can express text correctly and reducing dimension of high- dimension data by support vectors machine, refines data that text mining need to process. This dissertation focuses on research of process and technique of Web text preprocessing, the dissertation indicates structuring the information in Web pages by XML, and then express these texts by format that computer can deal with, extract useful information for text mining, reduce the amount of data, form a text feature database for text mining. Result of Web text preprocessing influence the quality and efficiency of Web text mining, therefore, Web text preprocessing is very important for Web text mining, it need particular and integrated research.The whole research and design work of this paper are summarized at last, and the next step about the architecture based on design method further is suggested.
Keywords/Search Tags:Web Content Mining, XML, Web date preprocessing, feature selection, SVM
PDF Full Text Request
Related items