Font Size: a A A

Text Understanding Based On Semantic Relevance Under Internet Environment

Posted on:2017-08-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:H ChenFull Text:PDF
GTID:1318330485950834Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Under Internet environment, in order to deal with web text much better, this research focuses on the semantic relevance of text. The research contains how to acquire text semantic relevance and quantitative measure the relevance, and how to apply text semantic relevance to various formats text under Internet environment.To acquire and quantitatively measure semantic relevance between phrases, we focus on Wikipedia to acquire text semantic information. Previous studies on this domain mostly only obtain relevance concepts of a given concept, but cannot quantitatively obtain the degree of relevance. In order to solve this problem, consider the semi-structure pages, we construct knowledge network based on Wikipedia concepts which have hype-links connect to different concept pages, and construct term-concept mapping structure based on the concept pages. Using Markov random walk model on the concept network, semantic relevances between concepts have been quantitatively measured. Then with the help of term-concept mapping, the semantic relevance of concepts could be transfered to phrase terms. Experimental results show that adding semantic relevances, the precision of topic extraction and text classification and clustering have been improved.For text semantic understanding of short text about special domain where the data scale is small, schema matching of query interface based on semantic information is taken as an example. Previous studies on this domain mostly only use structure of query interface and inflexibly match form elements. However, the semantic information of form have not been used, especially the label text of element. In order to solve this problem, an algorithm to match query interface based on semantic relevance is proposed. Coordinate with link-structure model of interface, the algorithm can deal well with schema match, and speed up the process of interface integration.For text semantic understanding of long interdisciplinary text where data scale is large, netnews and blogs automatic summarization has been focused on. Previous studies on this domain mostly consider similarity of sentences in text, or consider the importance of words in text, but cannot use the semantic relevance of phrase in text. To solve this problem, based on the semantic relevance of text, graph model of text summarization has been improved. On previous graph models, sentence is treated as vertex, and similarity of sentences is treated as weighted edge. The new model not only considers similarity of sentences in document, but also considers the phrases semantic relevance of document. Based on the two layer graph model, a much better summarization algorithm has been proposed. The performance of the algorithm shows the effectiveness.For text semantic understanding of the massive online user interactive data where the data mixed with new phrases produced by social media, sentiment classification of Web comments has been considered. In the world of the pervasive web, web records every aspect of our life. The data contain much emotion data of users. Whether tracking mass incidents on micro-blog or shopping on online mall, comments content of users especially the emotional tendency of comments is valuable. Emotion analysis of Web data is necessary. Previous works mostly based on traditional classification algorithms or based on sentiment word dictionary. These methods do not achieve as good result as theme classification tasks, because they do not consider the web environment. Using the phrase vector model, and training the vectors based on Skip-gram model, we could obtain the relevance between phrases. Then using the semantic relevance, sentence generation model has been constructed. Based on the model, sentiments of these comments could be better classified. The result shows that the algorithm is effectiveness.
Keywords/Search Tags:Text analysis, Semantic relevance, Wikipedia, Document summarization, Schema match, Word vector, Emotion classification
PDF Full Text Request
Related items