Font Size: a A A

Study On Information Retrieval Of Quality Internet Public Opinion Monitoring System

Posted on:2012-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:J B LiFull Text:PDF
GTID:2178330335460428Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The thesis introduces the significance and necessity of the product quality and food safety Internet public opinion monitoring. Several technologies including vertical search, information extraction, Chinese word segmentation, text similarity, text clustering and information retrieval were described. The thesis studies the optimization of information retrieval, the design and implementation of information retrieval interface. The main tasks done are as follows:(1) Designed a general framework for the system, and some main function modules including text similarity calculation, duplicated detection, cluster optimization, information retrieval, statistical report and the database.(2) Improved the text similarity algorithm by importing word co-occurrence, and applied to the optimization of information retrieval.(3) Used the MD5 algorithm to achieve duplicated detection on fully duplicated pages and used text similarity by inverted index on partially duplicate pages.(4) Achieved text clustering by an improved k-means algorithm, and then improved the information retrieval results. The system achieved the information retrieval user interfaces including query and statistical reports to show results to users efficiently and intuitively.The thesis optimized the information retrieval by text similarity, duplicated detection and text clustering, and showed the results to users by the user interfaces including query and statistical reports. It can provide users with information collected by quality Internet public opinion monitoring system timely. It helps the relevant departments on the supervision and management of the product quality and food safety Internet public opinion.
Keywords/Search Tags:information retrieval, text similarity, duplicated detection, text clustering
PDF Full Text Request
Related items