The Research Of HowNet Based Word Similarity Computation And Its Application

Posted on:2013-01-21

Degree:Master

Type:Thesis

Country:China

Candidate:Y Guo

Full Text:PDF

GTID:2248330395984832

Subject:Computer Science and Technology

Abstract/Summary:

With the rapid development of Internet, the text information exists in the form of datamation has become an important resource for computer processing. The traditional information processing technology is mainly based on statistics and the text surface features, which lack of semantic understanding of text content. And the bottleneck in processing effect has become one of the important problems that need to be resolved in current information science. Word similarity computation can carry out quantitative analysis of complex semantic similarity relations between natural language words on the semantic level, which can provide support for semantic analysis of information processing technology, and improve the effect of information processing. Therefore, the research of word similarity computation is of significant importance.This thesis is based on the basic theory for word similarity computation and text classification technology, and puts emphasis on the method of HowNet based Chinese words similarity computation and its further application in text classification. The main work is summarized as following:Firstly, the commonly used methods of word similarity computation have been analyzed, and focused on a typical method of word similarity computation based on HowNet. Meanwhile, the evaluation strategy in word similarity computation of this thesis was also illustrated. In addition, the classification process, commonly used methods of classification, evaluation strategies in classification and other knowledge of text classification techniques also studied in-depth.An improved method of word similarity computation based on HowNet is proposed. This thesis studied on HowNet system deeply, and analyzed deficiencies of the sememe similarity computation and other aspects of the typical method. According to these deficiencies, a method of sememe similarity computation with better differential capacity is proposed by combining the HowNet with similarity theory of things, as well as optimized the set similarity computation and conceptual similarity computation. The experimental results showed that the computation result of the method proposed in this thesis is closer to the artificial evaluation.A novel method of text classification based on semantic kernel is proposed. This thesis analyzed problems of high dimension feature space, semantic relations between features and text vector sparse in the traditional text classification methods. To solve these problems, the POS filtering and semantic kernel method was introduced. Based on the idea of semantic kernel method, a semantic matrix is established by using word similarity computation proposed in this thesis, the kernel function is re-defined, and a novel semantic kernel is build at last. The mapping of semantic kernel embeds the semantic similarity relations between features into the text vector to enrich the semantic representation of documents. The experimental results showed that the method proposed in this thesis improved the classification results to a certain extent.

Keywords/Search Tags:

Related items

1	Research Of Hownet Based Word Semantic Computation And Application
2	The Research Of Semantic Similarity Computing Algorithm Based On HowNet
3	Sentence Similarity Computing Combining Multi-features Based On HowNet
4	Research Of Sentence Similarity Computation Based On Semantic Analysis
5	An Algorithm For Optimizing Word Similarity In "Knowledge Network"
6	Research On Ontology-Based Semantic Text Categorization
7	Research On Chinese Phrase Structure Ambiguities Based On Semantic Analysis And Its Implementation
8	Research Of Chinese Word Sense Disambiguation Based On Hownet
9	Chinese Words Semantic Similarity Measure Research Based On Common Sense Knowledge Base
10	Semantic Similarity Based On Interval Intuitionistic Fuzzy Sets Is Studied