Font Size: a A A

Classification And Design Of Interdiscilinary Subjects In Cloud Environment

Posted on:2019-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y M FuFull Text:PDF
GTID:2348330542498713Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet brought the timely and efficient sharing of information,but it also brought an explosive growth of information.We also posed difficulties for ourselves while having such access to get information as weibo,public account and Douban,In the face of information explosion how we choose.So people need automated tools to help them pinpoint the key information they want to know from the ocean of information.On the other hand,due to the rapid development of the Internet,interdisciplinary fields of interdisciplinary research have led to the problems of inaccurate classification of new disciplines,single rules of feature word extraction,and excessively high dimensions of feature words.The traditional classification model classification accuracy is based on the exact word segmentation based on the original custom tailored discipline has become stretched.This led to the traditional classification method is not suitable for the current environment.Based on LDA text topic extraction model,this paper designs and implements massive text categorization model for interdisciplinary subject.On the basis of this,it realizes the functions of phrase combination,new word discovery,keyword extraction and cross-discipline classification.The main work and innovation of this paper lie in:First of all,this paper solves one of the difficulties of LDA model:to determine the number of topics,this paper uses the number of topics based on the density of adaptive selection;and solve the LDA model of the second difficulty:the keyword extraction basically depends on the accuracy of word segmentation,With so many new words and colloquial words present,accurate keyword extraction is also difficult.This article is based on the jieba extension package to enable it to learn new words.Then this article further clarifies the keyword screening criteria.Because the TF-IDF based on word frequency statistics can not be used as the only criterion for screening keywords in the current environment,it is also necessary to consider the part of speech,the position of words and whether the words appear in the titles.Finally,based on the key words of the article,we classify the interdisciplinary which is hard to start with the traditional classification methods.
Keywords/Search Tags:Discipline domain attribute extraction, interdisciplinary, keyword evaluation criteria
PDF Full Text Request
Related items