Font Size: a A A

Research On Breaking Topic Detection Technology For Food Safety Topics

Posted on:2019-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:L YangFull Text:PDF
GTID:2359330563453940Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Recently,events about food and drug safety have emerged one after another,especially in the Internet era,the problem of food and drug safety have become even more prominent.Therefore,there is an urgent need to adopt some kinds of technology to discover the problems of food safety as soon as possible and provide support for the supervision and control of food safety.It is one of the important technical approaches to discover the safety problem of food and drugs by detecting emergent topics on the corpus of food safety.Due to the long text and short text have different characteristics,the detection method for long text does not necessarily apply to short text,and vice versa.The current method for emergent topics detection only apply to long texts,such as news or apply to short texts,such as microblogs,there is no better way to analyze the long and short text at the same time.This research based on Web,used multi-source data on the Internet as the data source,including the long text data represented by news and short text data represented by micro-blog,proposed the topic correlation algorithm,collected relevant food and drug safety theme of the data to be processed to detect unexpected issues,mainly in topics keywords selection and text feature extraction,topic clustering and emergent topic detection techniques.In the topic keyword recognition,this paper adopts the data structure of suffix array for the two time combination screening of the ICTCLAS word segmentation technology of the Chinese Academy of Sciences,assure the word segmentation results not to be too small,and can express the complete semantics,then select the corresponding topic keywords as topic words by word segmentation and POS tagging,calculated the feature words' WTF-IDF(Weighted Term Frequency–Inverse Document Frequency,WTF-IDF)or TF(Weighted Term Frequency,WTF)values for dimensionality reduction and feature extraction.In the aspect of topic clustering,this paper proposes a word co-occurrence graph structure model,then use topic clustering process partition modularity method realize the graph structure based on model.This model makes full use of word co-occurrence relation,implements topic clustering of variable length text and rapid clustering of large-scale text corpus.In the aspect of emergent topic detection,a calculation method is proposed for burst words and important words,in time for the node structure,and pruning of graph model with sliding window,through clustering and clustering of candidate screening to detect unexpected topic accordingly.The detection results' accuracy rate is up to 0.85,recall rate is up to 0.75,Compared with the traditional method,the F value is increased by 14%.The graph structure model proposed in this paper avoid shortcoming in traditional methods which need different clustering strategies for different lengths of texts and unify the data of different text sources more effectively.Besides,the emergent topic can be detected accurately in different corpus.However,the degree of correlation between words and words in the detected topic is not strong.In the future,it is hoped that the semantic correlation will be established based on the graph model,so that the relationship between words and words in the clustered topics will be more closely linked and further improved the topic accuracy.
Keywords/Search Tags:Safety of food and drug, Topic feature extraction, Graph structure, Variable length text clustering, Burst topic detection
PDF Full Text Request
Related items