Font Size: a A A

Research On Retrieval Methods In Social Networks Based On User-Generated Content

Posted on:2020-04-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:B W ZhangFull Text:PDF
GTID:1368330572954811Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,content retrieval in social networks is becoming one of the most significant search applications rapidly.With the widespread of internet and mobile devices,the scenarios and user requests in search applications are more and more various,which makes IR research more challenging.User-generated content,as a vital research objective in social networks,are generated by users directly or indirectly.It contains large amount of structured,semi-structured,unstructured text or data,such as ratings,reviews,user tag,etc.This information can enrich the representation of documents or products in search process,as well as represents the preferences or views of users on the documents or products,thus improve the personal search results.However,user-generated content is scattered,fragmented,contains much noise,and the structure is various,which make it difficult for conventional methods to utilize user-generated content to improve search performances in social networks.Nowadays several researches have been conducted on user-generated content,which aim at classification,clustering and recommendation,while search is seldom involved.Most researches focus on user tag while other contents,especially for unstructured contents are ignored.Sequetially,several useful information is lost and the relationships among different types of user-generated content is ignored.In order to solve the search problem in social networks,this dissertation combines user-generated content and search tasks based on the chracteristics and conduct research for classical search applications in social networks.Specifically,the research proceeds from the elementary to the profound,starts from structured contents,to considering both structured and unstructured contents,and then the semantic vector representation for user-generated content,introducing recommendation ideas,conventional ranking models and natural language processing techniques to construct three different frameworks based on user-generated content.The main work and contribution are shown as follows:1)Firstly,this paper proposes a Generated Content-based Filtering(GCF)algorithm,which introducing the idea of "recommeding the similar products of products which users like" into search by returning similar documents of the top?ranked documents.The algorithm is combined with conventional ranking models to construct re-ranking models,which collect structued contents like user tag,ratings,to design different scoring functions in re-ranking process.A generic search framework based on structued content is constructed and learning-to-rank technique is utilized to merge the results of different re-ranking models for books or other products search in social networks.The experiments on Social Book Search benchmark validate the effectiveness of the framework.2)Secondly,this paper proposes a search framework based on pseudo-relevance feedback model which utilizes the features of structued and unstructured contents.For structued contents like user tag and ratings,their "semantic indivisibility" is assumed.The numeric information like ratings and initial ranking scores,and the textual information like tags are combined together and utilized into the term selection process in pseudo-relevance feedback model.At meantime,for unstructured contents like reviews and annoatations,two term selection models based on different tranformation approaches are designed to choose feedback terms and weight them.Through different feedback models,the queries are enriched and used for second-pass retrieval.A generic pseudo-relevance feedback framework based on structued and unstructured user-generated content is constructed and learning-to-rank is applied to merge different pseudo-relevance feedback results.The framework is used for three different social networks:catalogue websites for sharing,real-time brief information broadcasting networks as well as e-commercial networks.The collected IMDb,Tweet,Taobao products dataset and Social Book Search benchmark are used for validate the effectiveness and robustness of the framework.3)Finally,this paper aims at the semantic vector representation of user-generated content,which conbines deep learning models and natural language processing techniques.With the semantic vector representation of user-generated content,the vector representation of complicated queries and documents can be pre-trained through the designed text classification tasks.Some neural networks are designed based on partial ordering relation to match documents and queries.Meanwhile,for the evaluation metrics,the relevance between queries and documents,the quality of documents,the timeliness of documents and the diversity of returned list are all considered.Through the vector representations of the metrics,a search framework is constructed based on the semantic vector representation of user-generated content.A benchmark to search reading list is contructed to validate the effectiveness and the flexibility of the framework.
Keywords/Search Tags:User-Generated Content, Document Retrieval, Vector Representation for Text
PDF Full Text Request
Related items