Font Size: a A A

The Data Mining Research Based On Comment Website

Posted on:2014-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2248330398971584Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of network, there are many comment websites with a good interactive relationship among users. Real time and information’s alternating are the prominent features of these websites. Because of these features, these websites have lots of valuable knowledge. Mining the potential content has the important guiding significance to social development.The article selects BBS website, the most typical representative of comment websites, as the research object. It uses data mining to extract information from the comment content through the use of search engines. At the same time, it proposes a new OPIC algorithm (P-OPIC algorithm) to improve the data mining intensity of web content. The users can position target websites rapidly.The article researches the content and framework of search engine and the operating mechanism of open source search engine Nutch, the main research work is divided into the following areas:(1) It researches Nutch’s crawler framework and index framework. It researches PageRank algorithm, HITS algorithm and OPIC algorithm. Then proposes an optimization algorithm based on OPIC algorithm. The optimization algorithm adds web PageRank value and BBS website adjustment factor who can improve the stability of BBS page ranking results.(2) It adds new data and Chinese word plugin in Nutch. Through modifying Nutch’s source code, it reduces influence on the performance of the search engine system.(3) It researches the performance of the algorithm and compares the OPIC experimental data with the P-OPIC experimental data. The results prove that the P-OPIC algorithm can extract the users’ keywords better. It has made significant improvement in accuracy of web page ranking than OPIC algorithm. It analyses the experimental result and summarizes the advantages and disadvantages of the algorithm.
Keywords/Search Tags:data mining, search, engine, PageRank, OPIC, Nutch, BBS
PDF Full Text Request
Related items