Font Size: a A A

Intelligent Service Oriented Study And Application On Web Content Computing

Posted on:2007-10-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H ZhangFull Text:PDF
GTID:1118360185951378Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Web is now the most important way for man to acquire information and knowledge. But its hugeness, diversity, dynamics and semi-structure promote the difficulty in processing data by machine. It attracts many researchers devoting to find way to retrieve interesting information from the enormous amount Web pages, how to convert the information into knowledge and how to get individualized service from Web. Now research in web data can be roughly categorized in three fields: web content mining, web usage mining and web structure mining. Web content data is the main carrier of Internet information. It contains content data, marking or token and hyperlink. Web content based computing research focuses on web pages' content data, the hotspots includes information extraction (IE), information retrieval (IR) and intelligent web services. On the basis of survey of web content computing, this paper casts its focus on the following issues:1. Proposed an approach named Incremental FP-Growth, which can be applied in dynamic environment for mining the association rules.The data in web pages has the characteristics of semi-structure, irregularity and dynamics, and it makes web-content based data computing and mining difficult and complex. By making a survey of the theories and approaches, we proposed the iFP-Growth algorithm for the association rules mining for the web content data. And as an application in China car market, our experiments show the efficiency of association rules mining in the car consumption preference in various types, models and prices of cars.2.Proposed an model for text classification based on sentence correlation (TCSC).For the problems of text segmentation and multivocal in the research of information retrieval on classification and cluster of Chinese web document set, we present a method based on Chinese sentence to express the characteristics of Chinese text document with the help of corpus. It incrementally updates category corpus with the training documents; then calculates the sentences correlation matrix by their position weight and corpus item weight to classify documents. This model avoids the problem of word segment in Chinese documents and lowers the effect of multivocal of words in the phase of classification.
Keywords/Search Tags:Web Content Computing, Web Mining, Web Information Extraction, Web Text Classification, Web Intelligent Service
PDF Full Text Request
Related items