Font Size: a A A

Research On Full-text Information Retrieval Technology For We Chat Content

Posted on:2019-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:J W ZhangFull Text:PDF
GTID:2428330566960775Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the mobile Internet,nearly 10 million people have registered for the WeChat official account,and these accounts have pushed hundreds of millions of WeChat articles.However,there is less research on how to use large-scale WeChat article data.Using the Full-text information retrieval technology,users can find relevant information they want from a large number of articles.And this article studies how to use full-text information retrieval technology to provide users with high-quality WeChat content.Firstly,the key technologies of full text information retrieval are elaborated.In order to provide users with high quality WeChat articles,we devise a set of features like the influence of the WeChat official account and the popularity of the WeChat article.The influence of the WeChat official account is defined by combining the number of followers with several statistics such as the maximum reading number in all articles that the WeChat official account has pushed to subscribers.The popularity of the WeChat article is calculated by reading number,likes and headline.Then our ranking approach uses the weighted sum of the influence of the WeChat official account,the popularity of the WeChat article and Lucene ranking score as the composite ranking score to measure the relevance of the article and query.This paper conducted an experiment to compare our ranking approach with the ranking method of Lucene and the state of the art algorithm: BM25,and the result demonstrates that our method is more efficient and feasible for WeChat articles searchingIn this paper,we systematically compared and analyzed five kinds of commonly used query expansion methods: query expansion using global document analysis?query expansion using local document analysis?query expansion based on association rules?query expansion based on user log and query expansion based on semantic concept,and summarizes their advantages and disadvantages.Then we use the Word2 vector technology and the document topic model algorithm to design some query extension methods.Moreover,experiment was conducted to compare the query expansion methods proposed in this paper and found LDA+Word2Vec is the best one.In this paper,we design and implement a full-text information retrieval system suitable for searching WeChat content based on Lucene search engine.The system provides users with services to upload files and index documents,search WeChat articles and WeChat official accounts.Finally,we summarize the implementation methods of full text information retrieval technology for WeChat content,and looks forward to further research in the future.
Keywords/Search Tags:WeChat content, Full-text information retrieval, Query expansion, Lucene search engine, Document relevance sort
PDF Full Text Request
Related items