Font Size: a A A

New Chinese Words Detection And Sentiment Orientation In Micro-blog

Posted on:2017-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2348330518495709Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In the era when social networks swept the world,many new words and even new expressions spring up.They often come up with social hot news and are reflections of the public opinion.How to extract the new words from large amount micro-blogs and conduct sentiment analysis effectively play an important role on topic tracking and public opinion analysis of micro-blog.These new words contain a strong emotion,to a certain extent,represent the user's emotions.However,the existing text orientation study is mainly focused on product reviews and news reports.Currently,the tendency analysis on new words of micro-blogs is still using traditional methods and lack of relevant features of micro-blogs.So the effect is poor.This thesis consists of three following parts.First,this thesis designs and implements a method using generalized suffix tree to extract all possible candidates of new words based on computing repetitive content.By using suffix tree,string search and statistics can be done fast.Compared with N-gram algorithm,the time and space complexity are greatly reduced.Second,this thesis designs and implements a new words detection algorithm combined with rules and statistics.Considered the performance of some classical methods,the thesis uses the proposed statistical characteristics:mutual information and entropy of information to filter the candidates.As result shows,this algorithm has strong adaptability and higher accuracy rate.Finally,this thesis uses neural networks to determine the emotional tendency of new words.We introduce a new neural-network-based language model that distinguishes and uses both local and global context via a joint training objective.The model learns word representations that better capture the semantics of words,while still keeping syntactic information.These improved representations can be used to represent contexts for clustering word instances,which is used in the multi-prototype version of our model that accounts for words with multiple senses.The experimental results show that the method used in this paper can effectively find new words and study its' tendency.
Keywords/Search Tags:micro-blog, new words detection, suffix tree, sentiment orientation, word embeddings
PDF Full Text Request
Related items