With the flourishing development of e-commerce and social media in recent years, theweb users can publish their reviews to various topics and objects including online products,news events, public figures and personal experiences, etc. These subjective reviews containvaluable opinions, sentiments, attitudes and feels from the users. As the public engagementincreases, the subjective review texts have grown dramatically on the web. With suchmassive web reviews texts, how to effectively mine and analyze user opinions according tospecific information need, also know as web review texts oriented opinion mining, hasbecome a hot research topic in the fields of intelligent information processing, data miningand computational linguistics and so on. Meanwhile, web review texts oriented opinionmining has great research and application values, and can be widely used in informationretrieval, business intelligence and public opinion analysis, etc.Although the coarse-grained subjectivity and sentiment classifications have beenrelatively mature, the web review texts oriented fine-grained opinion mining still faces theproblems like large feature space, data sparseness, lacking of effective features, littleautomation and domain dependency, etc. Regarding above critical problems, this papermainly conducts research work on four aspects: fine-grained opinion elements extraction,adaptive opinion targets categorization, domain-specific sentiment lexicon construction andco-clustering of opinion targets and opinion words. The main research works andcontributions are listed as follows:(1) Regarding problems of insufficient features and effective combination of multi-level features, this paper proposes the opinion elements extraction approach by leveragingsequence labeling learning and syntactic semantic structure features. Since user reviewtexts are usually nonstandard, fine-grained opinion mining is more difficult than traditionalinformation extraction task, and needs to solve the challenges of large feature space, sparsedata and lacking of effective features. This paper transforms the task of opinion targets andopinion words extraction into the process of sequence labeling learning by employing CRFsmodel to construct unified extraction framework which effectively combines multi-levelfeatures, and also proposes a na ve graph-pruning strategy to classify opinion targets intosemantic categories. In order to solve the problem of insufficient labeling features, thesyntactic semantic structure features are introduced to exploit the syntactic dependency relations between long-distance words. The experiment results have verified theeffectiveness of the opinion elements extraction approach.(2) Regarding problems about domain dependency and semantic associationcalculation in opinion targets categorization, this paper proposes the constraints basedopinion targets spectral clustering approach. Opinion targets categorization is a core task inreview texts opinion mining, and has become the foundation of feature-level opinionsummarization and recommendation. In current research works, the domain-dependentcharacteristics of semantic association calculation intra opinion targets are usually ignored,and there also exists the problem of lacking of effective association information. Therefore,this paper studies the constraints based opinion targets spectral clustering algorithm to solveabove problems. The lexical and contextual constraint relations between opinion targets aremined to enhance their domain-specific associations. The constrained spectral clusteringalgorithm can not only incorporate prior constraints information, but also effectively reducehigh dimensionality and sparseness of the clustering space. The experiment results showthat the constraints based spectral clustering approach has effectively improved the opiniontargets categorization results.(3) Regarding problems of algorithmic domain dependency, sentiment seedsdependency and low accuracy, this paper proposes the constrained label propagationapproach for automatic construction of domain-specific sentiment lexicon. Sentimentlexicon is the solid foundation for automatic sentiment analysis. However, due to thedomain-dependent characteristics of review texts, the polarities of sentiment words are notfixed, but vary depending on the domain and context. The traditional construction methodsusually face the problems of domain dependency, little automation and low accuracy.Therefore, this paper studies the constrained label propagation algorithm to solve theseproblems. The candidate sentiment terms are extracted from domain corpus by exploitingchunk dependency knowledge and prior generic sentiment lexicon. The pair-wisecontextual and morphological constraint relations are defined and extracted betweensentiment terms to enhance their domain-specific sentiment associations. At last, theconstrained label propagation algorithm is employed to calculate the polarities of candidatesentiment terms, and construct the domain-specific sentiment lexicon. The experimentresults show that the constrained label propagation approach has effectively improved theprecision of domain sentiment lexicon construction, and is less affected by the sentimentseeds coverage problem. (4) Regarding problems about opinion targets and opinion words extraction and theirmatching relationship calculation in fine-grained opinion mining, this paper proposes theopinion targets and opinion words co-clustering approach. Most current research worksonly considered the explicit co-occurrence relations in local context, while ignored thehidden matching relationship in global domain context, and there also exists the problemsof lacking of annotated training data, and low accuracy caused by feature sparseness anddomain dependency. Therefore, this paper studies the co-clustering approach to transformopinion targets and opinion words extraction and their matching relationship calculationinto prior constraints based semi-supervised learning process. Besides the constraintrelations intra opinion targets, the constraint relations intra opinion words are alsointroduced to serve as prior knowledge to supervise the co-clustering process. Theconstrained co-clustering algorithm simultaneously clusters opinion targets into semanticaspects and clusters opinion words into sentiment groups associated with the target aspectsrespectively, and consequently obtains the global matching relationship between semanticaspects and aspect-specific sentiments. The experimental results have verified theeffectiveness of the constraints based co-clustering approach. |