Font Size: a A A

Research And Implimenttation Of Opinoin Mining Algorithm Based On Semi-supervised Learning

Posted on:2020-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:C Y WuFull Text:PDF
GTID:2428330572973719Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the Internet generates a lot of data every day,but the value of the data is sparse.How to extract the value of data from massive text dataset is becoming more and more important.In particular,with the rapid development of e-commerce business.The comment data based on products and services is very important for users and merchants.Users are more and more inclined to make decisions based on comment information,and users are more concerned about the fine-grained comments information.Therefore,text-based opionion mining technology is an important research direction.Traditional machine learning methods have a significant effect on text mining.In recent years,with the growth of data and the improvement of machine performance,more and more scholars are devoted to the field of machine learning and data mining.However,machine learning,especially supervised learning is required a large amount of labeled data as a training data,and annotate text requires a lot of labor costs.Even though it is difficult to acquire labeled data,there is a large amount of unlabeled data on the Internet.Such unlabled data still has data value.How to making full use of the value of unlabeled data to avoid text annotation is a problem that data mining technology continues to solve.Starting from practical problems,this paper faces the user's demand for massive text information mining and conducts a large number of theoretical research and practical exploration for the existing problem of insufficient labeled data.A semi-supervised learning opinion mining algorithm is used to solve the existing problems.Firstly,the opinion mining technology is used to get the aspects information from massive comments,including the aspect extraction of the text and the sentiment analysis based on the entity.The self-training algorithm based on semi-supervised learning is used to construct the aspect extraction model which is used to get aspects from the training data.The golden aspects of the text are obtained by calculating the importance degree of the word.Futhermore,the word vector model and gold aspects are used to generate aspect expressions,then the entity information of the aspect of the text is obtained.This semi-supervised learning approach avoids unlabeled data issues.Secondly,in order to obtain the emotional information of the aspect entity and judge the emotional tendency,the association rule is used to obtain the frequent itemsets of the aspect entity.Moreover,the point mutual information(PMI)is used to calculate the direct correlation strength of the words.Finally,the entity can match the emotional words directly.The sentiment analysis model is constructed to extract the emotional words of the text and judge the emotional tendency,and the semi-supervised self-training algorithm is used to generate the sentiment word sets using the seed emotional words and the text corpus.And the sentiment expressions are extracted according to the sentiment dictionary.Both of entity content generated for the text data and the emotion information based on the entity content have higher reference significance to the users.Finally,the opionion mining system is constructed based on aspect extraction and sentiment analysis.The system can aotomatically process a large amount of text reviews from the Internet and get the opinions based on the algorithm model researched in this paper.It also can generate summary of the review about products the services.
Keywords/Search Tags:semi-supervised learning, opinion mining, aspect extraction, sentiment analysis
PDF Full Text Request
Related items