Font Size: a A A

Research On Short Text Topic Information Mining Technology

Posted on:2021-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:J X WangFull Text:PDF
GTID:2428330605455973Subject:Engineering
Abstract/Summary:PDF Full Text Request
The rapid development of online social media marks the era of rapid change and sharing of information.The combination of online social media and the traditional information industry has developed many new applications that are closely linked to life,which has increased people's willingness to use.As one of the main manifestations of new applications,short text information has important meanings for digging out effective topics.At present,the topic model technology has already achieved considerable results,becoming one of the important ways of intelligent processing of text information.However,because the data between documents and words in short text is relatively sparse,the traditional model is not ideal when mining topic information in short text.In addition,the use of text co-occurrence information in text sets to expand data to obtain topic distribution has become the mainstream way of short text topic mining.Many studies have been improved based on this idea.Semantic Analysis Biterms Topic Model(SA-BTM)based on semantic analysis is proposed in this paper,and the influence of semantic relations on the results is considered when using co-occurring double words to obtain topics.At the same time,this paper studies the determination method of the theme dimension closely related to the theme mining effect.The main work of this article is as follows:1)Study the effect of semantic relations of co-occurrence words on topic mining.In this paper,through the training of a large amount of text data,the co-occurrence words are expressed in the form of word embedding vectors that can represent the semantic relationship,and the semantic relationship between the words is analyzed and compared through semantic similarity.The study compares the effect of expanding data with different semantic similarity intervals on topic mining.2)Semantic Analysis Biterms Topic Model is proposed.By analyzing the semantic relationship of the words in the document,appropriate selection of double words is used to infer the theme.The effective mining of short text topic information is realized,and the topic mining effect is given.And compared with the results obtained by other models.3)A method to adaptively determine the theme dimension is proposed.In the process of topic information mining,the topic dimensions that have a greater influence on the mining effect are currently determined mainly by experience.An adaptive search strategy for topicdimensions is proposed.Experiments show that the appropriate topic dimensions can be quickly determined in topic information mining.This article crawls a large number of different types of texts such as problem sets through web crawlers to construct experimental data sets.Using the models and methods proposed in this article,we can effectively mine topics and quickly determine the dimensions of topics.Experiments show that the model mining results proposed in this paper have a higher degree of aggregation.
Keywords/Search Tags:Topic information mining, Word embedding vectors, Semantic similarity, Topic model, Topic dimension
PDF Full Text Request
Related items