Trend Analysis Of Network Popular Words By Using Semantic Knowledge

Posted on:2018-06-21

Degree:Master

Type:Thesis

Country:China

Candidate:Y Fu

Full Text:PDF

GTID:2348330515983295

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Constructing words,document vector is the fundamental part of the computational linguistics field.We propose a word semantic vector construction framework that combining prior knowledge of heterogeneous network,can improve words semantic similarity calculation’s accuracy.By considering word semantic similarity in document’s topic vectors,the framework can also improve LDA based event detection’s accuracy.The two main part of the paper are:Improved word semantic similarity calculation framework:Constructing words vector is the key to word semantic similarity calculation.We propose a word semantic vector construction framework that combining word relationship weights information in heterogeneous network.The idea of the framework is based on Word2Vec method,which is use encoded vector of current word to predict the adjacent word or weights in heterogeneous network,and use prediction loss to adjust parameters in encoder in each training epoch.When training done,the encoder becomes the word id to word vector projector,and we get all word vector.The framework reduce the feature sparseness of the training document by combining weight information in heterogeneous network and improved the accuracy of word semantic similarity calculation.Improved LDA based event detection:LD A based event detection use LD A model to obtained document’s topic word vector,and consider documents cluster as event,which is clustered by K-Means method using cosine distance definition.We improved the method by combining word semantic similarity information and word frequency information into the definition of document topic word vectors distance.These additional information can help improving distinction of documents which has orthogonal topic word vector and the accuracy of the method.By comparing the result of our methods and the baseline,we can concluded that our methods has a certain improvement in accuracy,specially under feature sparseness circumstance.And our method has good flexibility to combine variety heterogeneous network knowledge.The innovation of this paper is as follows:1.Import more entities into Word2Vec model and estimate weights in heterogeneous network by using local point multiplied between entity vector.2.Import word semantic similarity factor and word frequency factor into definition of the document topic word vector distance.

Keywords/Search Tags:

Word embedding, LDA, Semantic similarity, event detection

PDF Full Text Request

Related items

1	Word Similarity Measurement Based On Word Embedding And WordNet
2	Research On WS-LDA Topic Model Based On Word Embedding And Semantic Similarity
3	Research On Chinese Word Semantic Similarity Computation
4	Chinese Semantic Similarity Dataset Construction And Word Embedding Fused Hownet
5	Research On Ontology Alignment Based On Word Embedding
6	Research On Lexical Semantic Similarity Measurement Based On Knowledge Integration
7	Research On Word Embedding Algorithm Using Count-based Models
8	Learning Event Expressions Via Semantically Equivalent Projection
9	Research On Key Technologies Of Open Domain Meta-event Extraction
10	Research On Semantic Expression Based On Knowledge Source Embedding And Multi-modal Data Fusion