Font Size: a A A

Design And Implementation Of Data Capture System For Public Opinion Based On Meta Search

Posted on:2017-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:J N ZhiFull Text:PDF
GTID:2308330485459803Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, Internet has increasingly become an important way for people to obtain information resources, expressing thoughts and views. The expression of attitudes, beliefs and values held by people about famous ones or issues have become nonnegligible for public opinion in the Internet and Internet public opinion monitoring system comes into being.The key part of social event is person, through grabbing person related social events and public opinion, to provide related departments as a basis for decisions, is an important direction of the development of public opinion monitoring system. When I’m in the internship in Weipukechuang Technology Co. Ltd. Beijing. I participated in design and development of the public opinion monitoring system. In the project, I am responsible for design and implementation of the information collection sub-system.The techniques of the information collection system of public opinion is based on Internet searching technology and there are many similarities between search engines and public opinion information collection techniques in terms of design and technical implementation. It can provide valuable experience for public opinion collection system by studying search engine technology. In the beginning of the project, in order to obtain better capture breadth and accuracy, I studied search engine principles, compared existing search engines, especially key technique of meta search engine to determine the project’s final technical architecture, and absorbed the advantages of full-text indexed search engines to realized the information collection system of public opinion. Improvements of this system include:(1) On query transformation, I analyzed query rules and page structure of non-directional collection and directional web site to make sure precisely data acquisition is realized;(2) Based on the particularity of public opinion information system’s storage,meta search engine web page database was established in terms of improving meta search engines.Through scheme selection of inverted index of Lucene’s by experimental comparison, I set up a MySQL inverted index databases, and realize full-text search function of instation better;(3) In the choice of deduplication strategies of meta-search, based on the comparative analysis of existing schemes, I optimized the title and text keywords combined deduplication algorithm of vector space to get the best deduplication policy;(4) In the area of public opinion information’s result ranking, based on requirement of monitoring system’s presentation layer of interest sorting, inspired from the ranking technology of the meta search engine, I learned and improved HITS algorithm, making ranking results meet the needs of users better, and experimental validated the rationality of the design. At the same time, through a combination of vertical search thought I solved the topic drift problem of the HITS algorithm, final through system testing I proved the superiority of the improved sorting algorithm;(5) On directional collecting, through analysis of the Internet forum’s structure, a conceptual model of post bar is established.Combined with the requirement of public opinion collecting,I created the forum’s concept model, and realized physical model of the system.This system is ultimately achieved through project development.I designed and tested the system based on operating data of the system and analyzed the data, finally,the test validated the rationality of the designation, including:(1) Collected news and information coverage exceed 17% of traditional search engine, proving the superiority of the acquisition system by using meta-search engine;(2) After the improvement of HITS algorithm,the homepage’CTR increase by 13%, the first three total click-through rate increased from 67% to 83% and it shows that through combining user’s interest we can provide users with better search results to reduced user costs and improve user experience;(3) The phenomenon of topic drift is in restraint through the improvement of HITS algorithm.
Keywords/Search Tags:Public opinion collect, Meta search, Vertical search, HITS algorithm, Interest sorting, Theme drift
PDF Full Text Request
Related items