Font Size: a A A

Public Information Collection Methods And Realization

Posted on:2012-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z C XuFull Text:PDF
GTID:2218330368997937Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of computer technology and the popularization of Internet, web data become more and more important. In open source information searching field, Internet has also becoming a new searching object. However, the Internet data and application have their own characteristic so that the traditional open source information searching method can't apply to the new object directly. The web mining developed from data mining technique is a novel method to resolve the conflict. The purpose and significance of this dissertation is to introduce Web mining methods into the open source information searching.In this dissertation, on the basis of analysis and summarization of Internet information characteristic and traditional open source information searching methods, quality and efficiency are enhanced sufficiently by the application of Web content mining, structure mining, usage mining. Firstly, in web content mining, aiming at characteristic of web text, the mining procedure is researched in detail, at the same time, this paper summarizes feather extraction algrithm and analyses the advantage and shortcoming of TFIDF algorithm especially. Then we put forward a new improved weighting TFIDF algorithm and the experiment validates the improvement of the precision ratio and recall ratio compared with the traditional method; In web usage mining, the pre-processing procedure is presented and the experiment validates its efficiency. Meantime, similar Apriori algorithm is studied and employed to find the frequent pattern of web page view; In web structure mining, we investigate the principle of Page-Rank and HITS algorithm and discuss its feasibility in open source information application. Finally, we expect the application foreground for web mining in open source information searching domain and bring forward research area in the future.
Keywords/Search Tags:open source information, web content mining, web structure mining, web usage mining, TFIDF algorithm, similar Apriori algorithm
PDF Full Text Request
Related items