Font Size: a A A

Study On Methods Of Network Information Process Based On Web Ming

Posted on:2010-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:P LiuFull Text:PDF
GTID:2178360278457000Subject:Management Science
Abstract/Summary:PDF Full Text Request
Along with Network infiltrates into almost every field of society, all kinds of information be shared in the network with unprecedented extend. Amount of network data shows increase exponential. Internet has become less cost and resourceful data source, and there are multiform information under cover it. But, the search engine in being can't analyses content of Web to help us get info and located the data. It replies us lots of Web touching upon words and expressions of topic. The rest depend on us. In the thesis, a method for network information Process Based on Web Ming is put forward. The method describes as follows:(1).The study of corpus base orient to the object and eigenvector sets. The construction of corpus base shorten Vector representation process of Web text, and reduce the dimensionality of eigenvector, and improve execution efficiency of classify and clustering. On the other hand, eigenvector simplifies storage pattern. The classification is to nicety(2). That MLDB base on index is construct unify storage criterion of network data to convenient for mining analysis, getting elements of sensitive information and locating data source.(3). The studies of mining module conclude two parts. One is Classifier based on network information base is construct. It introducesĪ‡~2 statistics arithmetic and pot out contribution of characteristic words to sorts. The one is clustering arithmetic aiming at Web data sets that introduces TF-IDF arithmetic. In the end, clustering result is put into classifier to finale of Web data sets. And offer support for distilling information...
Keywords/Search Tags:Web mining, Corpus base, Eigenvector, MLDB, Clustering
PDF Full Text Request
Related items