Font Size: a A A

Research On Micro-blog Retrieval Optimization Based On Internal And External Knowledge Expansion

Posted on:2018-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:C Y LiFull Text:PDF
GTID:2348330563452758Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The rise of social media not only reduces the cost of communication,but also changes the habit of people's consumption of information.People are no longer satisfied with the passive consumer information,turn to become the main body of manufacturing and dissemination of information.The whole people from the media era gave birth to a more severe "information overload" problem.On the one hand,microblogging media short text language paradigm,such as length restrictions,the use of special characters,the expression of colloquialism and other characteristics,making the traditional long text retrieval method in microblogging retrieval performance degradation,or even completely unavailable.On the other hand,mainstream social media platforms,such as microblogging,Twetter and Facebook,are eager to build a fast,intelligent microblogging information filtering system that provides users with more efficient information push services.This requires us to apply to the microblogging short text search method for in-depth study.In the existing methods of improving the performance of short text retrieval,the method of improving the performance of microblogging is improved by querying,which is widely used by researchers because of its simplicity and excellent performance.But with the deepening of the research on the method of query expansion,it is found that the existing problems still need to be solved urgently:(1)The user is looking for an understanding of the dilemma.Often the user's explicit query is simply a simple abstraction of the user's information needs,from which it is difficult to infer the user's actual search intent.(2)Multi-source information expansion risk management.Often the introduction of multi-source information for query expansion has proven to be effective in improving retrieval performance,but how to use multi-source data and how to manage the expansion risk remains to be studied.In order to solve the above problems,this paper proposes a user internal and external knowledge expansion(IEKE),which combines the internal and external multisource information and introduces the risk minimization iterative model to achieve the best query extension The original query and the user actually search for the intended purpose of the distance.The main contributions of the algorithm are summarized as follows:(1)In order to deal with the user's intention to understand the dilemma,we usethe internal knowledge of the retrieval document and the multi-source external feedback knowledge to extend the user's original query.In order to manage the extended risk brought by the introduction of multi-source external extension information in the extension process,In this paper,we propose a method of internal and external knowledge expansion(IEKE),which is based on the non-negative matrix factorization(NMF)method to construct a regularized constraint operator to minimize the risk of query expansion,To achieve the purpose of narrowing the original query and the user's actual search intent distance.(2)In addition,the face of the explosive growth of data,how to quickly deal with massive data is now hot spots.We discuss the fast iterative calculation of the query extension method of IEKE internal-external knowledge by using CUDA,Spark and other parallelization techniques and using the distributed data computing platform.Experiments on the Microblog corpus provided by TREC show that microblogging retrieval optimization based on internal-external knowledge co-expansion can greatly improve microblogging retrieval performance.At the same time,the parallelization experiment of the algorithm also shows that the CUDA platform parallelization has a very high speed advantage on the basis of slightly sacrificing the computational performance.However,due to the limitation of the video memory,the Spark platform parallelization on the large data set provides the hardware On the basis of the calculation.
Keywords/Search Tags:micro-blog search, query expansion, non-negative matrix factorization, regularization
PDF Full Text Request
Related items