| The rapid development and popularization of the Internet brought greatconvenience to our live, work and study, the amount of information on the network isalso growing rapidly, various types of shopping websites, blogs and communicationplatforms based on Uyghur have been established and improved, so a lot of Uyghurcomment texts are gradually increasing. These comment texts have great practicalsignificance, how to extract useful information from large comment texts efficientlybecomes a difficult problem. Therefore, opinion mining technology has become aresearch hotspot.This paper mainly focuses on relation extraction and opinion holder extractio n ofopinion mining technology with the Uyghur comment texts, the main research contentincludes the following three points:(1)Topic extraction: the paper proposes an extraction algorithm of domainevaluation object based on statistics and grammar rules, considers the relationship ofterm relative frequency and evaluation words, regulates weighing of evaluation wordsaccording to document frequency and the co-occurrence frequency between topicwords and the domain authoritative words, uses the improved TF-IDF method toextract topic words, then utilizes the t-support value to filter the redundant words andget the final product reviews topic words.(2) Topic-opinion pairs extraction: a relation extraction method of opinionmining based on Bootstrapping algorithm is proposed, which taking Uyghur commentsentences as the research corpus. In each iteration process, the optimal patterns areselected to extract topic-opinion pairs according to the improved score formulas. Afterthe iteration, for the comment sentences that topic-opinion pairs are empty, the nearestmatching algorithm is used to extract topic-opinion pairs. Finally, paralleling model and negation model are introduced to expand and amend topic-opinion pairs. Theultimate goal of relation extraction is to establish one or more tuples <topic, opinion>for every comment sentence, and there is one unique opinion word that corresponds toeach topic word.(3) Opinion holder extraction: on the basis of analyzing the Uyghur grammaticalcharacteristics and rules, the Uyghur comments are viewed as research object, afine-grained three-layer model is proposed to extract opinion holders. CRFs(Conditional Random Fields) model is used to identify all the opinion holdercandidates of each comment, combining with the manual heuristic rules and Uyghurname composition rules; then the evaluation sentences are divided into four differenttypes according to opinion holder classification algorithm, and different extractionmethods are put forward for the corresponding opinion holder type, respectively; atlast, the extended rules are introduced to mend opinion holder extraction results. |