Font Size: a A A

Http Communication Analysis In Network Penetration

Posted on:2016-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:M F LiuFull Text:PDF
GTID:2308330473962458Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the rapid development of Internet, there are more and more information on the web, which has become a primary way for people to get information. As an important tool for information retrieval, the search engine brings convenience for people; however, it also brings potential danger to users. Web Spam is a bad behavior that it misleads search engines through improper manners, thus providing bad information to users. Web Spam seriously affects user experience, brings hidden trouble to users, and also affects the performance of the search engine. It has become one of the main challenges for modern search engines to detect spam pages and provide high quality search results.The main content and research results of this paper include:(1) Research of principle and technology of Web Spam. Firstly the paper analysised the theory of search engines and the algorithms of sorting search results, such as the TF/IDF model and PageRank. Secondly, combined with the principle of the search engine, the paper analysised the characteristic of spam pages. Finally, the paper showed common means of spam and the appropriate anti-spam methods.(2) A spam detection method based on topic and semantics was proposed. By analyzing the principle of topic model and semantic analysis, the characteristics of spam pages on topic and semantic were researched, several features based on topic and semantic were proposed. Topic modeling were performed over the contents of the page, followed by semantic analysis and calculation according to the distribution of topics. Finally, topic and semantic features were extracted for the classification and the detection of the webpages.(3) A spam detection system based on topic and semantic was designed and developed. A web crawler was constructed to get webpages on the Internet. After processing the contents of the webpage, topic modeling was performed and relevant features were calculated and extracted. At last, the detection samples and the machine learning classifier were constructed to detect the spam pages and display the results.(4) The performance of the spam detection method based on topic and semantic was verified by experiments. The experimental results showed that the proposed method can effectively identify the spam pages. Compared with the traditional method based on statistics, the proposed method can achieve a better effect in recall ratio, precision and the Frmeasure.
Keywords/Search Tags:Web Spam, Search Engine, Topic Model, Semantic Analysis
PDF Full Text Request
Related items