Font Size: a A A

Research On The Retrieve Sorting Strategies In BBS Based On Hot Topics Discovery

Posted on:2014-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:K W QuFull Text:PDF
GTID:2248330398971586Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Search engine is an important tool for Internet users to find information in the vast amounts of data on the Web. Currently, the search engine application is toward diversification trend. Information retrieval technology matures, making it possible to develop the diversity of search engine application.As an Internet tool forum (BBS) carrys network information, and as well as news media or release of information, can quickly released a "sudden event" online "discuss". With the participation of the number of Internet users increases, some will gradually evolve into a "hot topic"Currently, most BBS website "search" function is relatively simple, generally only in sub-forum, theme-based keyword search query results Sort by single sorted by time, can not give users high-quality queryservice. At the same time, general search engine performance for the BBS retrieval service is also very general.The subject by creating one kind of structures in the BBS lightweight search engine, intends to use the hot topics of the field of public opinion analysis discovery technology, a reference to the search engine retrieval sequencing strategy, the "heat" of the article, as a rank ordering of aof important indicators taken into account, the research and found a hot topic retrieval sorting method.First, based on the Lucene full-text search technology to quickly set up a BBS retrieval system, research and the BBS Web information collection and extraction process, the creation of the index files, query processing, search engine key technologies, and JSP/Servlet technology-based searchengine user interface design.Then, using agglomerative hierarchical clustering algorithm, identify the topics on the BBS. Theme selected posts influence, attention (Replies) Replies contribution rate, the level of activity as a topic of heat assess the impact factor. After several experiments, the material to determine the weights of each impact factor. Score heat of the post, to adjust the impact factor of the right sort under the heat effect.Finally, bring out a new sort, based on Lucene sorting mechanism and the introduction of articles (posts) heat values involved in the calculation. The basic idea is:the final score, query matching articles (posts) multiplied decided by the the article query similarity score and articles heat value.
Keywords/Search Tags:BBS, Retrieval, Sorting Strategy, Hot, TopicLucene
PDF Full Text Request
Related items