Font Size: a A A

Research On Tag Generation Method And Its Application In Information Retrieval

Posted on:2022-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:D Y JingFull Text:PDF
GTID:2518306557478004Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid progress and development of modern information technology,the Internet has become an indispensable part of people's daily life.While the Internet brings us a lot of convenience,the rapid growth of network information resources also makes us gradually lost in the era of “information explosion”.Thus,how to quickly and accurately locate the information we need from a large number of unstructured text data has become an urgent need for many Internet users.It has aroused widespread concern of researchers.As an important way of resource description,tag has been widely used in news information service,information retrieval and text classification.Now,there are no tags in most of the text resources on the Internet.In recent years,a series of tag generation methods have been studied and proposed by Scholars.However,the existing tag generation methods generally have the problems of low tag generation accuracy and single Angle.To this end,this thesis improves the existing tag generation method under different scenarios and improves the accuracy of tag generation to a certain extent.The work of this thesis is described as follows:(1)The tag generation method based on traditional TextRank algorithm is improved.Considering the important influence of topic information,external knowledge and statistical characteristics on tag generation.Taking this as an entry point,with the help of the classic TextRank algorithm,the topic information is introduced to alleviate the pressure of words being ignored due to low frequency.At the same time,the external word vector is integrated to expand the semantic information of words,and the influence of part of speech and word length on tag generation is comprehensively considered.Experimental results on real data sets verify the effectiveness of the proposed method.(2)The tag generation method based on collaborative filtering is improved.With the stabilization of the tag system in resources,there are a large number of articles with relatively high content correlation.Meanwhile,the tags of these articles also have some overlap and the tag reusability ratio is high.Taking this as an entry point,the existing similarity measurement method is improved and the similarity between text resources is used to generate tags.Finally,a more applicable method of tag generation is designed by combining the above two methods.Experimental results on real data sets verify the effectiveness of the proposed method.(3)Based on the tag generation method above,this thesis designs and implements a tagbased text information retrieval system.The system realizes the two functions of text tag generation and tag-based retrieval,which can improve the efficiency of information retrieval by using the automatically generated tags as the text index.
Keywords/Search Tags:Tag generation, Topic model, TextRank, Collaborative filtering, Information retrieval
PDF Full Text Request
Related items