Font Size: a A A

Research On Important Documents Detection Methods Based On Optimal-path In Academic Network

Posted on:2016-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:J B NingFull Text:PDF
GTID:2298330467495540Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the age of big data, the scale of Academic Information grows rapidly and networkstructure is complex. If the types and relations of Academic Information are not considered,the entire academic world is a numerous independent academic island. Without referencebetween documents and cooperation between authors, the academic world is utterly dead. It isthe different types of nodes and diverse relations that makes the academic world colorful. Ithas practical significance to assess the importance of the path to mine the optimal path in anetwork through the construction of academic network in the face of multitude of academicinformation. Some important documents and excellent authors appear during the long courseof academic development. Learns are introduced to documents and authors by data mining toopen their eyes, stimulate their intelligence, keep abreast of the latest developments in theacademic field, provide the guidance and reference from vary point of view.Current research on academic network based on path is primarily for the citation networkwhich is an isomorphic network. Generally speaking, compared with foreign countries, thedomestic research on the data mining method of citation network based on path is mainly theintroduction of foreign advanced technology and application in the practice. At present, thereare three main types of main path analysis method including local main path method, globalmain path method and k-route main path method and four types of traversal count methods:Node Pair Projection Count (NPPC), Search Path Link Count (SPLC), Search Path Node Pair(SPNP), Search Path Count (SPC).Main path analytic method has unparalleled advantages over others in tracing thedevelopment of one field, but also has its limitations.On one side, the main path analysis method is based on connection. The documents inthe main path are not necessary the highly cited documents and the highly cited documentsare not necessary in the main path. Meanwhile, the influence of time factor involving in resultof the importance of path are ignored. For the disadvantages of main path method, it isnecessary to research into the citation network. Considering the time factor influence on theimportance of path, Price aging theory is introduced into the Important Documents DetectionMethods(IDDM) based on Optimal-path. Two evaluation metrics of the importance of a path,P-index and I-index are proposed, in consideration of factors such as the number of citations,inter-citation and time. In this paper, Optimal-Path (OP), an optimal path algorithm in citationnetwork based on keywords is proposed. The approach contrasts with the method of SearchPath Count, the experimental results demonstrate the effectiveness of the proposed method in On the other side, the real academic environment is often heterogeneous. There aredifferent types of academic nodes, such as author, document, and conference and relations inthe academic network. Optimal path is affected by a variety of nodes and relations in the realacademic environment. Mining optimal path in the heterogeneous academic environment is aninteresting and changing job. Heterogeneous optimal path is studied based on isomorphicnetwork (citation network). For the heterogeneous academic network, two evaluation metricsof the importance of a path which are HP-index and HI-index and the algorithm named HOPare proposed. In the end of this paper, the difference of optimal path in the heterogeneousacademic network is shown on the data sets of Transfer Learning between the optimal path inthe isomorphic network.
Keywords/Search Tags:main path, academic network, Optimal Path, important documents
PDF Full Text Request
Related items