Font Size: a A A

Topic Discovery And Trend Forecasting In The Science And Technology Literature

Posted on:2014-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y B XueFull Text:PDF
GTID:2268330422951690Subject:Computer technology
Abstract/Summary:
The rapid development of science and technology in recent decades hasbrought about a booming growth of related literatures. Various digital librariesand document databases emerge, as a result. However, faced with thistremendous and multi-structured body of information, users often find it is hardto gain a comprehensive understanding, and locate the info pieces they want aswell. To address this issue, this paper carries out a research in a literature dataset,detecting research topics and trends. Then contents of the research are as follows.To begin with, this paper collects the literature documents released byimportant conferences and journals in the field of NLP by constructing thecrawler program, and through the PDFBox toolkit, extracts the title, the abstractand the time information of the literature document to build the science andtechnology literature data set.Followed up, this paper applies maximal frequence itemset extractingalgorithm to the dataset to discovery research topics. This method enables theresearcher to acquire the rough research topic information, though mixed withuseless information that are hard to eliminate. A method based on LDA model isthen applied to the result from the first step to further filter the data. LDA modelprovides the words possibility distribution, which help define the weight of eachword, and hence enable the researcher to reject useless data, which solve theproblems occurred in the first step. However, the LDA model cannot fully dig upthe needed information, thus this paper introduces an improved method whichcombines LDA model and controlled vocabulary together at the end of the paperto deal with the shortages of the second method.Then, this paper assumes that the user retrieve volume of the topic, to acertain extent, reflects their research trend. So in this part, this paper predicts thetopic trend by adding the time distribution information of user retrieval volumeinto the trend forecasting model established. Then, this paper carries out thefurther research on the topic trend forecasting. Finally, this paper predicts thetopic trend by adding the time distribution information of relative retrieval volume but not absolute retrieval volume into the trend forecasting model, andfinds that the beitragen of user retrieval volume for the trend forecasting researchhas greatly increased.In the last part of the thesis, the findings are applied to Tnet, a websitecontaining information of teachers from colleges and universities. By connectingthe teachers’ information with the literature dataset, this paper develops a systemthat is able to provide a package of information including research topics, theirtrend, and related teachers in the scientific area to its users.
Keywords/Search Tags:science and technology literature, topic discovery, trends forecasting, user retrieval volumes
Related items