Font Size: a A A

Technology Of Information Intelligent Perception Based On Web And Its Applications

Posted on:2005-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:C L ZhaoFull Text:PDF
GTID:2168360155472043Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
The Web contains much useful knowledge. It has become a hot research direction to discover useful knowledge or patterns from this huge information source. The goal of Web mining is to discover the access pattern and hidden information from the huge collection of documents plus hyperlink information, access and usage information. How to discover the useful resource and knowledge is a very complex problem based on the following reasons: (1) The Web data is rather huge; (2) The Web page is more complex for its non-structure and semi-structure; (3) The Web is a highly dynamic information source.Technology of information intelligent perception based on Web aims to attempt a automatic and intelligent road on which we can get the interesting information from huge data on the Web, and then translate the information to knowledge used by people directly. This research will be very important either in theory or in practice.This paper builds a bridge between them with reference to the techniques of Web mining and natural language processing, and applied natural language understanding in Web mining, thus we can study the Web mining by semantics. This paper also constructs an information intelligent perception model based on Web, this model founds a new research way of detecting the change of Web data instantly, extracting the interesting information on Web automatically, and then translating the information to knowledge for decision-making by people. This paper:1) Designing and realizing a method of eliminating noises in Web pages by style tree model, experiment results show that show that our noise elimination technique is able to improve the mining results significantly.2) Constructing a model of Chinese words segmentation , which is based on the N-shortest-paths method, to achieve the goal. In parallel, a statistical model can easily be obtained by attaching frequencies to the edges of the word-graphs.3) Presenting an automatic algorithm of text categorization imitating human's based on grammar parsing. The method is designed specially to text categorization in financial domain. It is effective in classifying financial texts. Experiment results show that the algorithm greatly improves the performance of a text classification system.
Keywords/Search Tags:Web Mining, Natural Language Processing, Noise Elimination, Text Categorization
PDF Full Text Request
Related items