Font Size: a A A

Research On Document Modeling And Query Expansion For Short Messages

Posted on:2017-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2348330488958699Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer networks and the internet, accurate access to information from the mass of information resources is becoming an increasingly difficult task. A significant part of the mass of information is in the form of short messages. And short messages are essential data forms for the people in their daily lives. Short messages include blog comments, tweets, short mobile messages, chat records. They are characterized by few words, not standardized expression, large-scale, time-sensitive and frequently update. Traditional search engines do not consider these characteristics, so it can't satisfy people's need to obtain accurate information. Therefore the invention and implementation to achieve a more appropriate document modeling method for short messages does have an important theoretical and practical value. This paper is organized as follows:(1) Propose a more appropriate document modeling method based on energy model to help improve short messages retrieval accuracy. Using three-layer deep boltzmann machine model and the word vector information, the model can get both the linear information and non-linear information of short messages in the form of document vectors. Due to the addition of the word vector information, the proposed model adds some semantic information to each short message and models it more accurately. This paper uses linear and non-linear documents vectors for document classification and retrieval tasks on the published datasets, receives a satisfactory result, and improves the accuracy.(2) Propose a method for query expansion using word vector tools. Inspired by the linear relationship of the word vectors which are trained by deep learning, combined with the word weight trained by the three-layer deep boltzmann machine, we propose a query expansion method from a global perspective. In addition, we analyze the difference between pseudo feedback query expansion and word vector query expansion, including their respective advantages and limitations. The experiments on Sina microblog dataset shows that a mix of the word vector query expansion method based on global corpus and the pseudo feedback query expansion based on local document set is proposed to effectively remove the noise words, enhance the quality of query expansion and improve the NDCG of retrieval system.
Keywords/Search Tags:energy model, query expansion, word vector, boltzmann machine
PDF Full Text Request
Related items