Font Size: a A A

Research And Implementation Of A Public Forum Information In Real-time Retrieval

Posted on:2013-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhangFull Text:PDF
GTID:2218330371459728Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
As a new thing in contemporary society, Internet plays an increasingly important role. BBS is a product with the development of the Internet, it is an open platform. The different of BBS and common website is that people can not only access information from BBS but also can release information to BBS, this makes our communication very convenient. But with the time moves on, its negative and dangerous side gradually revealed. Some criminals spread illegal information on the BBS. Because of the high speed of spread, illegal information can cause very serious consequences in a short time, so we need to find these messages very quickly. This paper designs a vertical search engine to deal with the information on BBS. It can not only make a deep data mining to the specified BBS but can also do a 24-hour monitoring of emerging information.The vertical search engine designed by this paper is made up of three modules: information acquisition module, information analysis module, information indexing and searching module. The information acquisition module is composed with the meta search engine which is developed by the existing interfaces of general search engines and the web crawler which is developed by myself. The information analysis module is used to extract structured information from some common format files such as HTML, Word, Excel and PDF, this is achieved by the method of using templates and website information denoising. The information indexing and searching module is build on Lucene which is an open source file, the vertical search engine of this page provides an convenient and efficient query interface for users.The feedback from users shows that the vertical search engine designed by this paper has a very good performance when doing the depth data mining and real-time monitoring.
Keywords/Search Tags:BBS, meta searching, web crawler, real-time search, Lucene
PDF Full Text Request
Related items