Font Size: a A A

A BBS Search Ranking Strategy Based On C4.5 Algorithm

Posted on:2010-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:J SongFull Text:PDF
GTID:2178360278966404Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, on-line information was geometrically growth, how to obtain valid information becomes particularly important. Current access to the information is through the major search engines, especially to the general website. With rapid development of the Internet at the same time, many large-scale community or BBS has also sprung up, many of which are high-quality communities, aimed at resolving the users day-to-day problems encountered ,because of technical reasons, the mainstream search engines and can not be good for retrieval of these contents, but the corresponding large-scale community or BBS has developed its own search function, but due to the particularity of large-scale communities and BBS, that is editorial page, making all kinds of information are likely to be modified in the community, including ads with irrelevant information. Which makes users prone to ads or irrelevant information because of low quality of search results, and even the Spam may be in it. Paper is to address the target in the current complex network environment, especially for large communities and BBS forums, the results retrieved from BBS, especially the search ranking should reflect the importance of the replies, which can be used to detect the spam of the Page. Given the importance of ranking formula, return the corresponding importance score. The decision method is mainly used C4.5 decision tree algorithm, to find out the content of spam, useless information. First of all, the web pages needed to be pre-processed, including find out the rely pages from all the crawled pages, and then needed to be analysis which attributes should be extracted. Attributes extraction mainly consider from the theoretical analysis of selected properties, and from the basis of the English language habits, as well as producers of spam habits analysis, which complete all the attributes selected, and calculate score of the replies. And with this value, according to an importance ranking formula, we can get the important score, with the help of the score we can have the BBS Search results more reasonable, so as to enhance the user search experience.
Keywords/Search Tags:BBS(Bulletin Board System), Spam, C4.5 Algorithm, Search Ranking
PDF Full Text Request
Related items