Font Size: a A A

Trend Analysis Of Network Popular Words By Using Semantic Knowledge

Posted on:2018-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y FuFull Text:PDF
GTID:2348330515983295Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Constructing words,document vector is the fundamental part of the computational linguistics field.We propose a word semantic vector construction framework that combining prior knowledge of heterogeneous network,can improve words semantic similarity calculation's accuracy.By considering word semantic similarity in document's topic vectors,the framework can also improve LDA based event detection's accuracy.The two main part of the paper are:Improved word semantic similarity calculation framework:Constructing words vector is the key to word semantic similarity calculation.We propose a word semantic vector construction framework that combining word relationship weights information in heterogeneous network.The idea of the framework is based on Word2Vec method,which is use encoded vector of current word to predict the adjacent word or weights in heterogeneous network,and use prediction loss to adjust parameters in encoder in each training epoch.When training done,the encoder becomes the word id to word vector projector,and we get all word vector.The framework reduce the feature sparseness of the training document by combining weight information in heterogeneous network and improved the accuracy of word semantic similarity calculation.Improved LDA based event detection:LD A based event detection use LD A model to obtained document's topic word vector,and consider documents cluster as event,which is clustered by K-Means method using cosine distance definition.We improved the method by combining word semantic similarity information and word frequency information into the definition of document topic word vectors distance.These additional information can help improving distinction of documents which has orthogonal topic word vector and the accuracy of the method.By comparing the result of our methods and the baseline,we can concluded that our methods has a certain improvement in accuracy,specially under feature sparseness circumstance.And our method has good flexibility to combine variety heterogeneous network knowledge.The innovation of this paper is as follows:1.Import more entities into Word2Vec model and estimate weights in heterogeneous network by using local point multiplied between entity vector.2.Import word semantic similarity factor and word frequency factor into definition of the document topic word vector distance.
Keywords/Search Tags:Word embedding, LDA, Semantic similarity, event detection
PDF Full Text Request
Related items