Text Understanding Based On Semantic Relevance Under Internet Environment

Posted on:2017-08-23

Degree:Doctor

Type:Dissertation

Country:China

Candidate:H Chen

Full Text:PDF

GTID:1318330485950834

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Under Internet environment, in order to deal with web text much better, this research focuses on the semantic relevance of text. The research contains how to acquire text semantic relevance and quantitative measure the relevance, and how to apply text semantic relevance to various formats text under Internet environment.To acquire and quantitatively measure semantic relevance between phrases, we focus on Wikipedia to acquire text semantic information. Previous studies on this domain mostly only obtain relevance concepts of a given concept, but cannot quantitatively obtain the degree of relevance. In order to solve this problem, consider the semi-structure pages, we construct knowledge network based on Wikipedia concepts which have hype-links connect to different concept pages, and construct term-concept mapping structure based on the concept pages. Using Markov random walk model on the concept network, semantic relevances between concepts have been quantitatively measured. Then with the help of term-concept mapping, the semantic relevance of concepts could be transfered to phrase terms. Experimental results show that adding semantic relevances, the precision of topic extraction and text classification and clustering have been improved.For text semantic understanding of short text about special domain where the data scale is small, schema matching of query interface based on semantic information is taken as an example. Previous studies on this domain mostly only use structure of query interface and inflexibly match form elements. However, the semantic information of form have not been used, especially the label text of element. In order to solve this problem, an algorithm to match query interface based on semantic relevance is proposed. Coordinate with link-structure model of interface, the algorithm can deal well with schema match, and speed up the process of interface integration.For text semantic understanding of long interdisciplinary text where data scale is large, netnews and blogs automatic summarization has been focused on. Previous studies on this domain mostly consider similarity of sentences in text, or consider the importance of words in text, but cannot use the semantic relevance of phrase in text. To solve this problem, based on the semantic relevance of text, graph model of text summarization has been improved. On previous graph models, sentence is treated as vertex, and similarity of sentences is treated as weighted edge. The new model not only considers similarity of sentences in document, but also considers the phrases semantic relevance of document. Based on the two layer graph model, a much better summarization algorithm has been proposed. The performance of the algorithm shows the effectiveness.For text semantic understanding of the massive online user interactive data where the data mixed with new phrases produced by social media, sentiment classification of Web comments has been considered. In the world of the pervasive web, web records every aspect of our life. The data contain much emotion data of users. Whether tracking mass incidents on micro-blog or shopping on online mall, comments content of users especially the emotional tendency of comments is valuable. Emotion analysis of Web data is necessary. Previous works mostly based on traditional classification algorithms or based on sentiment word dictionary. These methods do not achieve as good result as theme classification tasks, because they do not consider the web environment. Using the phrase vector model, and training the vectors based on Skip-gram model, we could obtain the relevance between phrases. Then using the semantic relevance, sentence generation model has been constructed. Based on the model, sentiments of these comments could be better classified. The result shows that the algorithm is effectiveness.

Keywords/Search Tags:

Text analysis, Semantic relevance, Wikipedia, Document summarization, Schema match, Word vector, Emotion classification

PDF Full Text Request

Related items

1	Research On Chinese Text Classification Based On Semantic Analysis
2	Research On Semantic Representation Of Text Based On Topic Model
3	Research On Web Text Sentiment Analysis Method
4	Research On Emotion Analysis For Chinese Product Reviews
5	Semantic Analysis for Improved Multi-document Summarization of Text
6	The Research Of Automatic Single Text Summarization Based On Latent Semantic Analysis
7	A semantic partition based text mining model for document classification
8	The Research Of Semantic Vector Representations And Modeling Approachesfor Text
9	Research On Automatic Multi-document Summarization Based On Statistics And Semantic Analysis
10	Research On The Method Of Differential Summarization Of Bilingual News