Research And Implementation Of Text Similarity Computing Based On Semantic Understanding

Posted on:2016-07-01

Degree:Master

Type:Thesis

Country:China

Candidate:R Z Sun

Full Text:PDF

GTID:2308330479976765

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Text similarity computing is mainly to compute the similarity of content, syntax, structure between two or more text information through the establishment algorithm model, and it is a key technology related to lots of important applications in the text information processing. Text similarity computing mostly uses word frequency statistics, and the most representative method is vector space model(VSM). VSM expresses text into the feature item vectors, then text similarity is represented by vectors angle cosine. In addition there are GVSM algorithm based on generalized vector space model, latent semantic indexing LSI algorithm, string matching algorithm, fingerprint recognition algorithm, and so on. Text similarity computing based on semantic understanding uses some kind of knowledge bases, which is added to word semantic, sentence semantic and paragraph semantic and other factors, the calculation result is more suitable for practical applications.Traditional text similarity algorithm based on the How Net is building on VSM, the text feature item vectors are represented by How Net sememe vector space, added to word semantic considerations. This paper makes improvements on the basis of original algorithm, on the one hand improved computing on How Net sememe similarity by using of sememe hierarchical structure, adding semantic factors on depth and density, to make the results more perfect; on the other hand added the paragraph similarity compared to the original algorithm, increased the influence on the whole text similarity. This paper use text clustering experiment to verify the effectiveness for modified algorithm, and it also proved that modified algorithm achieved a better performance.Based on theoretical research, this paper implements a text similarity system using the J2 EE platform and open source technology. According to the system function, this system is divided into four modules: How Net data processing module, text pretreatment module, text vector structuring module, synthesis computing module. It provides the solution of design and implement for different modules. The system has achieved process of text sememe vector representation and similarity computation with NLPIR, Lucene, SSH and other open source software. Finally, the similarity system is applied to the actual engineering project and achieved a well performance.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research On Semantic Similarity Measurement For Text
2	Research And Implementation Of Semantic Similarity Computing By Combining Knowledge-based And Corpus-based Methods
3	The Research And Application On Text Similarity Measurement Based On Semantic Analysis
4	Text Similarity Computing Theory And Applied Research
5	Research On Paper Similarity Based On Semantic Understanding
6	The Study Of Measures And Applications Of Short Text Semantic Similarity
7	Research On Short Text Similarity Measure Based On Semantic Coupling
8	Research On Chinese Text Similarity Computing Based On Semantic Weighted
9	Research On Similarity Computing Method For Domain Texts
10	Research On Text Similarity Measure Method Of Combining New Word Analysis And Semantic Analysis