Font Size: a A A

Xml-based Web Text Data Mining Research

Posted on:2008-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:L G WangFull Text:PDF
GTID:2208360215466098Subject:Agricultural mechanization project
Abstract/Summary:PDF Full Text Request
The network technology has penetrated in all sides of the society. With the rapid development of Internet all over the world, more and more database and information system emerged, which make Internet become the largest database in terms of its variety and size. The growth of computing capability and the development of large scale data storing technology drive people to affront difficult situation. On one hand, users yearn for necessary information quickly and appropriately. On the other hand, the huge information and complexity of the information make information handling a little bit difficult, In order to solve the problem, Web Data Mining is one of the effective ways of the problem solving. At present, research of Web Data Mining, which needs deeper study theoretically, realistically and technically, is still under investigation.The focus of traditional Data Mining technology is on the structured data, especially on relation database or data warehouse. Web which can expand infinitely is a loose distributional information system. It has no concentrate control, no unified construction, no complete bind, no affairs management, no standard language and data model. Therefore, Data Mining based on Web has many problems. The emergence of XML provides a new opportunity to solve this problem. This thesis focuses on problem solving of Web Data Mining based on XML technology.The present study pays close attention to Web text Data Mining. Firstly, the present study introduces relevant theories of Data Mining and present situation of Data Mining abroad and at home. Then it presents the application of Web Data Mining which includes the development, features of XML and related technical norms. In light of the semi-structured data dealing, we can make use of XML technology to change semi-structure data into structure data, and build Web text Data Mining model in order to assist users to acquire information efficiently.The result of Web text preprocessing has great effect on the quality and rate of Web Text Data Mining. Therefore, Web text preprocessing is crucial and need further study. The present study emphasizes the process and methods of Web text preprocessing. It proposes that XML technology change the Web page information constructively, and make Web text become the form which the computer can handle, then draw useful text mining information and decrease data to form a text feature dadabase which is the foundation of Web text mining. The Web text mining model in the present study mainly includes Web text preprocessing and function of Web text Data Mining. The first advantage lies in the using of XML technology to pick up information, and then the second is get term gather which can express the text content properly and make the text mining dealing data more perfect. At the same time, it makes the transition from XML to data model come true.Through a concrete sample, we describe in detail one of the method that realize Web text data extraction by XML in this paper.
Keywords/Search Tags:Data Mining, XML, web text, data extraction
PDF Full Text Request
Related items