Font Size: a A A

A Novel Chinese Subjective Sentences Recognition Method Based On Word Co-occurrence Relationship Graphic Model

Posted on:2016-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:C Q FuFull Text:PDF
GTID:2308330470464019Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the application and popularization of the Web2.0, the view of "from user-centered design to participatory design" has become the advocated concept of the Internet today. Nowdays, new media such as the forum, post bar, blog, and micro blog has provided customers with a more free communication platform. More and more user express, spreads or exchanges their personal views or ideas via the Internet. This kind of user-generated content contains huge commercial and social value. Therefore,how to exactly extract and recognize the subjective sentences from the large quantity of text has an important theoretical value and realistic meaning.Currently, the main methods for subjective sentence recognition adopt the vector space model to represent documents. That is, each document is represented as a termvector or a feature vector. However, the feature vector representation method is based on the strong assumption of term independence, which doesn’t consider the order and dependency between any two terms. Based upon the above observation, in this paper, we propose a novel term co-occurrence relationship driven and graph model-based method to recognize the Chinese subjective sentences. It describes the distribution difference among the terms within both subjective and non-subjective sentences sets via the term co-occurrence relationship graph model and semantic information. It can effectively capture the semantic information within the Chinese subjective sentences. Meanwhile, different with the traditional VSM-based feature value calculation, it combines the indegree-based term weighting calculation way of graph model with the complex eigenvalues calculation method of information retrieval to effectively calculate the emotional value of the terms in the graph model.Experiment results on the corpus show that the performance of the Chinese subjective sentences recognition can be significantly improved, which outperforms the state-of-art methods.The main work of this paper consists of the following three portions:1) Firstly, we build the term co-occurrence relationship directed graph for the subjective and non-subjective sentence sets, respectively. Specifically, we describe the co-occurrence, syntactic relationship, and the distribution difference of terms.2) Secondly, we combines the indegree-based term weighting calculation way of graph model with the complex eigenvalues calculation method of information retrieval to effectively calculate the emotional value of the terms in the graph model.Meanwhile, we train a SVM classifier to identify the Chinese subjective sentences based on the above method. In order to verify the effectiveness of our method, we also setup the comparation experiment with current representative models.3) Finally, we tune some parameters such as the sliding window size and the direction of the directed graph of our graph model in order to improve the performance of Chinese subjective sentences identification further.
Keywords/Search Tags:word co-occurrence relationship, graphic model, subjective sentence, recognition, machine learning
PDF Full Text Request
Related items