Font Size: a A A

Research On Tag Semantic Retrieval Based On LSA In Social Tagging System

Posted on:2012-03-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y G XuanFull Text:PDF
GTID:1118330332974373Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Under Web2.0, social tagging system that gradually develops into a key platform of resource organization and sharing has become a principal developing direction of Web. Due to vague and unregulated presentation of tags or the large quantity of tag resources, users have to filter a great number of search results that leads to low retrieval efficiency. Since tagging behaviors of most users in social tagging system comply with social common understanding, essential and potential semantic structure exists to control appearance of tag and semantic composition of resources. Therefore, this paper is intended to find out an approach that automatically acquires semantic relation of tags, give semantics to resources, present and store tags and resources in a highly calculable and operable form based on Latent Semantic Analysis (LSA)-an algebra model of information retrieval.This paper is organized as the following four parts:(1) After reviewing previous research studies, development history, definition, system model of social tagging and LSA are introduced. Three core elements of social tagging system and disadvantage of system tag retrieval are analyzed. And mathematics reference of Latent Semantic Analysis approach is introduced. On the basis of reviews and analysis, tag semantic retrieval model based on LSA is proposed and applicability of such approach to social tagging is analyzed.(2) Semantic tagging of social tagging system is researched to improve resource model. Weight algorithm of tag-resource matrix is proposed that uses local weight, global weight of tag and global weight of resource to improve conventional TF-IDF algorithm. Global weight of tag is used to measure the importance and capacity of tag in resource identification while global weight of resource is used to describe quantity of information provided by tag set. The original tag-resource matrix weight is improved via row computation and column computation to make new matrix better reflect integrity of social tagging system.(3) Similarity and sort algorithms of semantic retrieval of social tagging system are studied by firstly introducing 6 similarity computing algorithms and then taking improved cosine similarity formula as retrieval type and computing algorithm of similarity between resource sets. Popular sort algorithm is drawn on, starting with the "long-tail" phenomenon of tag and resource, to analyze formation and distribution rule of tag. The sort of algorithm is then improved with sequence of similar users and resources to obtain that resources with more similarity will get higher ranking.(4) In order to verify feasibility and superiority of the program stated herein, tests are performed against algorithms proposed. Taking representative "delicious.com" for example, data is captured and cleared to get approximately 200,000 entries of raw data, and then relation among resource, user and tag is analyzed to establish relevant network. Using Matlab,2 groups of retrieval tests are performed against those data with algorithm proposed in this paper and traditional algorithm to get different sorts. Different results from tests are analyzed and evaluated according to recall ratio, precision curve, recall-precision curve, recall-precision histogram, MAP index and others to draw a conclusion that the improved tag semantic retrieval approach proposed hereby is superior to traditional vector space model method.Finally, conclusions are summarized, limitations and directions for future work are presented.
Keywords/Search Tags:tag, social tagging, semantic tagging, semantic retrieval, LSA
PDF Full Text Request
Related items