Research On Short Text Similarity Measure Based On Semantic Coupling

Posted on:2020-03-24

Degree:Master

Type:Thesis

Country:China

Candidate:W Liu

Full Text:PDF

GTID:2428330572985934

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet technology,the massive information manifests an explosive growth.As the emergence of various social media,short texts,such as microblogs,and instant messages,chat software and question-answering system are very prevalent on today's websites.Text similarity measures play a vital role in text related applications in tasks such as NLP,information retrieval,text classification,document clustering,text filtering,topic tracking,question answering,machine translation,text summarization and other.The similarity measurement for short texts is complex and can be influenced by numerous factors.For example,text representation,terms weighting strategy,semantic relation modeling and similarity algorithm,etc.Through analyzing the limitations of the traditional short text similarity algorithms,effective approaches are presented to measure the relationship between terms by capturing both the intra-relation(explicit)and inter-relation(implicit),which is implemented via utilizing modified intra-relation and inter-relation between texts.In addition,we also take the discrimination and indication of the strong category feature of the terms into account,and design the corresponding strong classification feature-based similarity function.Finally,two kinds of similarity methods are considered to capture the final short text similarity.The major contributions of the paper are summarized as follows:(1)We propose a novel short text similarity measure based on coupled semantic relation.First of all,the method considers the co-occurrence information and the distance between terms to get the co-occurrence correlation degree.The related weights of the terms are calculated based on co-occurrence correlation degree,and then the inter-relations and intra-relation of the terms are calculated by using the related weights.Related weights and general Jaccard are combined to define intra-relation.The inter-relationship is defined as the shared entropy of the path formed between the two terms on the intra-path-graph.The greater the shared entropy,the stronger the inter-relation is,and the stronger the relationship between the terms is.Both intra-relation and inter-relation between a pair of terms are combined to define coupled semantic relation.Finally,this paper obtains the improved similarity of coupling relation based on the coupling semantic relation of terms.(2)We design a strong classification feature-based similarity function.The improved expected cross entropy is utilized to extract the strong category features of each class from labeling data set.The expected cross entropy is descended ordering and the top K features are selected to form strong classification features dictionary.Besides,we propose a novel terms sense disambiguation by utilizing terms context similarity.The basic idea of strong classification features similarity is that the more similar two texts are,the more features of strong classification they share.(3)The similarity algorithm based on coupling relation and strong classification features is designed.On the basis of the first two algorithms,a more efficient and advanced similarity algorithm is designed,considering the coupling relation of terms and strong category features.In order to verify the validity of short text similarity,clustering task is performed on DBLP data set,20newsgroups and Sogou corpus data set.The experimental results show that the proposed method has superiority clustering effect than the benchmark methods.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research On WordNet Based Chinese-english Cross Language Text Similarity Measurement
2	Research On Chinese Word Sense Disambiguation Based On Semantic Analysis
3	Research Of Word Sense Disambiguation Based On Hybird Features And Rules
4	Research On Feature Selection And Weighting Methods Based On Terms Distribution
5	The Field Of Term Extraction And The Relationship Between The Classification Study
6	Research On Semantic Similarity Calculation Of Chinese Short Text
7	Research On Word Sense Disambiguation Based On GCN Model
8	Study On How Net Ontology Based Text Categorization Algorithm And It's Application
9	A Chinese Unsupervised Word Sense Disambiguation Method Based On Semantic Vector
10	Research On Word Sense Disambiguation And Syntactic Parsing Based On Computing Of Semantic Template