Font Size: a A A

Design And Implementation, Based On Public Opinion Analysis Judged The Emergency Warning Platform

Posted on:2012-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhuFull Text:PDF
GTID:2218330368497937Subject:Software engineering
Abstract/Summary:PDF Full Text Request
At present, the information on Internet increases rapidly. Search engine has already become one of the most effective and swiftest ways for obtaining information. But the traditional search engine always return a great amount of result with low accuracy, so users must waste vast amount of time to search the information. Two factors will influence on the performance of the search engine. From the viewpoint of data source, webpage noise will influence the accuracy of the index of the webpage theme, which will reduce the accuracy of searching. The result returned by search engine shows that the note of the hyper links plays an important role in providing a suggestion about how to choose links. And then the note with low quality will inevitablely misled users. So that the technologies of webpage denoising and automatic summary are researched to improve the performance of search engine in this thesis.One advanced denosing technology of clipping DOM tree is researched and realized in this thesis. Through the statistics and analysis of the source information flows of the popular news websites, the simplified DOM tree is constructed. Then based on the differences between the noise and usefull information, the denoising strategy of clipping the DOM tree is constructed. The performance of this strategy is improved constantly by processing millions of webpages. At the same time, much more styles of webpages can be processd by this technology. This denoising technology can process nearly all kinds of webpages, and also has high accuracy and efficiency.The above-mentioned denosing technology has three characteristics:First construct a strategy called"dual-judgement of webpage type". The accuracy of judging is 95.20%;Second construct a mechanism called"dual-localization of text of webpage". The accuracy of recalling text is 95.048%.Third this thesis brings forward 8 guidelines of evaluation, which make the evaluation more exact. Based on the webpage denosing, the technology of automatic summary is researched and realized, which combines automatic extract and article structure togther. This summary technology has no restriction in field and also has high efficiency. Using the similar way of extracting full text summary of the webpage, multi-theme summary of webpage is abstracted, which can be used as the note of hyper links in the search results. Actual observation shows that this way of noting can provide better suggestion to the users to choose hyper links than the traditonal ones.
Keywords/Search Tags:Search engine, Webpage denosing and reconstructing, Automatic text summary
PDF Full Text Request
Related items