Font Size: a A A

Research On Sorting Method Of Scientific Heterogeneous Network Based On Topic Model

Posted on:2020-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:B JinFull Text:PDF
GTID:2428330602954329Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the huge amount of data generated by users through retrieval is not satisfied with the simple matching of search terms and the single display of information,but hope to transform to more complex forms and get more satisfactory results.In this way,all kinds of information and data are connected in the form of potential relationships to form complex social networks.Social network data is becoming more and more complex.Simply peeling off other information and focusing on the same type of network object relationship can simplify our processing ideas,but it causes the loss of information.Therefore,the heterogeneous information network that this paper focuses on can make different types of object association reflected.This paper focuses on the heterogeneous scientific research network environment,hoping to obtain the ranking results of papers,authors,journals and Related words under a certain search term.It mainly faces the following problems:(1)We hope to build heterogeneous research network with the title of the paper,published journals or conferences,authors and abstracts as the object,but there is a one-to-one relationship between the title and abstract of the paper,so we can not simply and clearly build this network.(2)How to use the dependencies of objects in heterogeneous networks to mine the common hidden topics of heterogeneous network objects.(3)How to identify the authoritativeness of the objects in heterogeneous networks by using the implicit semantic topics of the objects in the network and the relationship between the objects and the objects,so as to rank the results.The main research contents of this paper are as follows,and this function is implemented in the system.(1)By using the topic model,the top-ranking subject keywords are calculated from the abstract as heterogeneous network nodes.Four types of subjects,journals or conferences published,authors and keywords are used to construct heterogeneous scientific research networks.Comparing Baidu Open Source SentenceLDA algorithm with traditional topic model LDA algorithm,we find that Sentence LDA is more suitable for calculating topic distribution probability of sentence type,while LDA is suitable for calculating word type,so Sentence LDA algorithm is used to extract keywords from abstracts.(2)Gibbs sampling is used to model heterogeneous network objects with TCKA method.The model utilizes the inherent dependencies among heterogeneous objects to analyze their common hidden topics.If we use the same type of node's subject model calculation method alone,we can not make good use of the relationship between different objects,which may result in some deviation.(3)The ConNetClus algorithm,which can be clustered and sorted in heterogeneous networks.Combining with the TCKA model and the ConNetClus algorithm for authoritative sorting of network objects,and applied to the retrieval of heterogeneous objects.The experimental results show that this method based on topic weight and authority is superior to the traditional keyword matching based retrieval algorithm,and also better than the traditional semantic modeling method.The above-mentioned method uses ES retrieval database and SSM framework to implement the system.The retrieval results are reflected on the page by authoritative papers,published meetings or journals,experts and keywords.Analyzing the topic relevance and authoritativeness of nodes is of great significance for network analysis and mining,and can effectively optimize the traditional sorting algorithm.
Keywords/Search Tags:Heterogeneous information network, Topic model, NetClus algorithm, Sort method
PDF Full Text Request
Related items