Font Size: a A A

Research On Text Material Recommendation Method Combining Label Classification And Semantic Query Expansion

Posted on:2022-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y MengFull Text:PDF
GTID:2518306767462754Subject:Journalism and Media
Abstract/Summary:PDF Full Text Request
In the preparation process of various planning and research reports,the document preparers often need to collect and read a large number of text materials according to the proposed catalogue or title,sort them out and then select and use them,which is not only a heavy workload but also the quality cannot be guaranteed.With the advent of the era of big data,reference materials are increasingly multi-source and multi-category,sorting and classifying text materials has become a big challenge,which can be time-consuming and boring.Therefore,the text material recommendation method came into being,which uses information retrieval and other technologies,according to the proposed directory catalogue,the suitable text materials can be automatically and accurately retrieved and recommended from a large number of text materials.However,using the existing text material recommendation methods in the field of digital government planning documentation has some challenges,such as poor recommendation effect.Therefore,this thesis carries out research on text material recommendation method combining label classification and semantic query expansion in the field of digital government planning documentation,and proposes a multi-factor weighted fusion text material recommendation method.The method combines the three proposed methods through differential evolution algorithm.Through the experimental verification on 10 datasets,the results show that this method can significantly improve the performance of text material recommendation,which can greatly reduce the workload of manual material selection and classification,and reduce the difficulty of documentation.The main contents are as follows:(1)Aiming at the problem that the title information of text material in the original references is often ignored in text material recommendation,a text material recommendation method based on similarity comparison of catalogue title and text material title is proposed.This method uses the word vector model trained in the digital government domain corpus to represent the feature of catalogue title and text material title using average vector method.Then cosine similarity is used to calculate the similarity between catalogue title and text material title.Finally,Top N text materials are recommended through reverse order of similarity.Experimental results show that this method performances better than baseline methods such as editing distance and VSM Model,and the text material recommendation method integrating this method has better effect.(2)Aiming at the poor short long text matching effect that retrieving text material content(long text)by catalogue title(short text),a text material recommendation method based on similarity comparison of catalogue title and text material content is proposed.This method also uses the word vector model trained in the digital government domain corpus to expand the catalogue title in semantic way from vector representation level from text material title-content knowledge base.Splice original catalogue title and expanded word vector to obtain the average representation of the expanded catalogue title word vector,then use the cosine similarity to calculate the similarity of the catalogue title and text material content,and finally recommend Top N text materials through reverse order of similarity.The experimental results show that this method is better than the baseline method.(3)Aiming at the lack of consideration of classification information from text material division to catalogue title in text material recommendation,a text material recommendation method based on label classification is proposed.This method is based on the text material label classification training set,uses linear support vector machine model to predict the label and normalize the prediction probability of the text material,which is used to calculate the matching degree between cataloguer title label and text material label,supporting the classification of the text material to the catalogue.Finally,Top N text materials are recommended through reverse order of matching degree.The experimental results show that the text material recommendation method integrating this method has better effect.(4)Aiming at the problem that the above three text material recommendation methods separately consider the information such as text similarity,text similarity under semantic expansion and label classification,and do not consider how to integrate these three different dimensions of information,a multi-factor weighted fusion text material recommendation method is proposed.The method using differential evolution algorithm to automatically learn the best linear combination parameters of the three kinds of similarity(catalogue title – text material title similarity,catalogue title-text material content similarity,catalogue title labeltext material label matching degree)to obtain the linear weighted fusion similarity between catalogue title and text material.Finally,it is concluded that the result of text material recommendation considering the three kinds of similarity is the best.
Keywords/Search Tags:text material recommendation, information retrieval, digital government, query expansion, differential evolution algorithm
PDF Full Text Request
Related items