Font Size: a A A

Based On Web Text Mining Research

Posted on:2009-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ZhangFull Text:PDF
GTID:2178360245994633Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The World Wide Web is a distributed global information resource containing a large amount of data relevant to essentially all domains of human activity. Given the high rate of the volume of data available on the WWW, finding useful information in such a large amount of data becomes a more difficult process every day. Data Mining is the term given to the automated discovery of non-obvious, potentially useful and previously unknown information from large data sources. So obtaining valuable information by Data Mining techniques intelligently and automatically, improving efficiency of the WWW has tremendous application values.Data mining provides us with a new way to resolve the problem that we can not make best of increasing huge data. Today, the main data source of data mining is historical databases which mainly include text and numerical data. The data mining on text data on WWW, lacks enough research. In this paper, we discussed some basal problems about text data mining, including definition, the oriels and technologies that text data mining needs, system structure and model.Aim at the concrete problems of Chinese text mining based on Web, this paper mainly researched the methods and implement technique. This article discussed the Chinese word slice, character extraction, character expression and character matching methods, and established the Chinese text classification and clustering algorithms based on neural network. In the design of Chinese text mining based on web, the paper analyzed and researched the expression of web page information, structure feature, web page control symbol and HTML control symbol, and built the extraction flow of web page information, then gave two concrete application of Chinese text mining based on Web through combining with practical problems.
Keywords/Search Tags:WEB, data mining, text data mining, KDD, database, knowledge base
PDF Full Text Request
Related items