Font Size: a A A

Internet Quality Anomalies Mining

Posted on:2012-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:T Z HouFull Text:PDF
GTID:2218330338967515Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Internet information explodes with the rapid development of computer technology, but the quality of information is different, and there are minority commit crimes using the Internet to cheat, and it causes negative effects to the applications of web information, so that we need to pay more attention to the quality of Internet information. This thesis aims to mine the existed problems of information quality in the Internet, and to find the under improvement web source, so that information source with higher quality can be identified and mined to provide Internet users a high quality information source environment.Architecture of Internet quality anomalies mining system is presented in this thesis; methods of network anomalies mining are analyzed briefly. Fuzzy measure of web source quality is presented by the analysis of Internet quality anomalies, and the quality of Internet source is transformed to numeric data which can be identified by the computers. The Internet quality anomalies take a little part of the mass Internet data, but they could hold some information with great value. By employing the thinking of outliers'detection, they could be defined as outliers, and the Internet quality anomalies can be mined by mining the outliers.The Internet data is high dimensionality data with multi-properties, so that the classical outlier detection methods can not meet the requirement of high dimensionality data. By researching outlier detection algorithms, some algorithms which perform well in high dimensions environment are found to work on mining web anomalies, while improvements have been done to some traditional algorithms. The first improvement is density based LOF algorithm which combines with the SOM algorithm, and then two algorithms are implemented. They are Mahalanobis Distance based outlier detection algorithm and Discriminate Analysis based outlier detection algorithm. Multi-data sets are used in designing the experiments, and full analyses of the three algorithms are made to identify the detection performance of the algorithms. The core purpose of this thesis is achieved by mining the Internet data with the three outlier algorithms and analyzing the visual results of the experiments, which is to mine the Internet quality anomalies.
Keywords/Search Tags:Web quality metadata access, High dimensional outlier detection, Mahalanobis distance, Discriminate analysis
PDF Full Text Request
Related items