Font Size: a A A

Community Mining Based On The Theme Of The ACM Paper Library And Visualization Analysis

Posted on:2017-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:L Z PengFull Text:PDF
GTID:2308330509959475Subject:Engineering / Computer Technology
Abstract/Summary:PDF Full Text Request
The output of scientific research grows rapidly with the improvement of research level. Literature, as one of the output form of scientific research, its growth rate corresponds to the "exponential growth law". And the law of scientific development shows that the knowledge is inherited and cumulative. Literature as one kind of knowledgepresents the generation and dissemination of knowledge through the citing relationship between the literature and the cited literature. As the literature number increase rapidly, this kind of citation relationship becomes a largescale complex network, which was named Citation Network.Citation Network is one kind of knowledge network and usually contains lots of useful information. In order to help people to search for the related literature from the knowledge swamp, many retrieval tools appear. However, most of search tools use the retrieval algorithm which based on the similarity of text and query.But, the result of literature retrieve may contain different research directions or some of them may be ranked after a better place. What’s more, the literature can’t be found which the content related to the topic but not include the query terms or contain less query terms. It is not beneficialfor them to find out the relevant literature and learn the development history and the research status in the field.There are many similarities between the citation network and web link network, and we want to apply the Web community discovery algorithms to the citation network, and mine the literature community, which are a collection of documents interested in a subject and mutual frequently cited, in the citation network. This way can help to improve the above situation.In this paper, the main research work is to design and implement a community discovery algorithm based on a given research topic in the ACM paper library, and analyze the literature community. The algorithm is in the frame of HITS algorithm based on the theme of the link analysis technology. Considering the characteristics of citation network and the topic-drift problem of HITS algorithm, the research improvesthe related issues from the following two aspects:(1) In the HITS algorithm, if the pages of root set are not highly related to the topic or the theme is broad, it usually brings a lot of unrelated pages in the process of extending root set to the base set. Therefore, we try to use the keyword query expansion technology to improve the precision of the root set;(2) As we know, the HITS algorithm only considers the page link relations besides the page content. At the same time, the link between edge is equal treatment, which is one of the reasons causing topics drift. In this paper, we want to build reference semantic relation matrix by calculating the semantic similarity between papers, which may reduce the theme drift occurring in the process of iterative calculation.In addition, we do a further research about the semantic similarity calculation of papers when build the citation semantic network matrix. In terms of the semantic similarity calculation algorithm of papers, we choose a method based on the words semantic similarity computing. We present an algorithm based on community mining of Wikipedia to compute words semantic similarity. Our method makes use of the huge Wikipedia page network with category labels rather than its textual content. In order to get the community of a word page, we apply the HITS, which is a community discovery algorithm based on the theme, to pages network. We measure the semantic similarity between the two words from three aspects based on the community:(1) semantic relations between the two word pages;(2)semantic relations between the two communities of word page;(3)semantic relations between the category which belong to two communities. The experimental result shows that the method we proposed is feasible and even better than some classic algorithms, and it is closer to the artificial judgment result.The analysis of the papers community includes mapping community figure, analysis of high quality papers, analysis of the main journals and analysis of the overall development of the field according to the time. The experimental result shows that the algorithm can get the related literature community according to the requirements of the user’s query, and help the user to quickly learn the development of one research field, and analyze the high quality paper, time and main journal published in the field. In a word, the research helps users understand the subject development more comprehensively, and to grasp the future development direction of the subject more easily.
Keywords/Search Tags:Citation network, Paper community, HITS, Wikipedia, Semantic similarity
PDF Full Text Request
Related items