Font Size: a A A

Research On Web Information Selection Based On Credibility And Semantic Similarity

Posted on:2017-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:X C ZhengFull Text:PDF
GTID:2308330488961134Subject:Library and Information Science
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, the Internet has become a huge, global information service center, and it’s the primary source to access information and knowledge of people. However, due to the openness and unbounded of Internet, The quality of information on the Internet is uneven, filled with a lot of false, incorrect and useless information. In the face of the vast, bad information on the Internet, people usually use the major search engines to find their required information. However, the mainstream search engine as a business tool, its search results do not make users feel particularly satisfied:on the one hand, it cannot guarantee reliable quality web top surface; on the other hand, it may contain a large number of duplicate and reproduced pages. This greatly affects the efficiency of users access to information, but also is a waste of time and effort to filter the information of users. Therefore, this paper proposes a web information selection method based on credibility and semantic similarity, which aims to reduce the burden of people to access high quality and high reliability information from the Internet, and improve the efficiency of web page information selection.In this paper, firstly, on the basis of comprehensive investigation and systematic analysis of the existing related research at home and abroad, summarizes the relevant theoretical research results and technical methods. Secondly, focused on the construction of the Web information credibility evaluation system, and divides it into three levels:authoritative of sources, significance of content and web page relevance, each level also set more specific evaluation indexes, through expert scoring method and analytic hierarchy process to determine the weight of each index, and gives the calculation formula of web page information, credibility. Thirdly, focusing on analysis the DOM tree structure of the web page of text extraction method and realization process on the basis of detailed analysis of the content and structure of web, and the LDA topic model is applied to the web page semantic similarity calculation, and proposed a method of web page semantic similarity calculation based on LDA topic model, and analyzes the process of its implementation in detail. Finally, this paper designs and implements a web information selection system based on credibility and semantic similarity. The function of each module is analyzed in detail, and the validity and practicability of the proposed method are verified by experiments and results analysis.
Keywords/Search Tags:Web information credibility, information selection, semantic similarity, DOM, LDA topic model
PDF Full Text Request
Related items