Font Size: a A A

Research On Micro-blogging Privacy Detection Based On Bayesian

Posted on:2014-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z S JiangFull Text:PDF
GTID:2268330425966100Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, micro-blogging is becoming more and more popular around the worldwith its integrated, opening, easy to operate, rapid spread and wide coverage features, whilethe following micro-blogging privacy disclosure problem is also gaining more concern.Researches for micro-blogging privacy detection are still in the initial stage, and are gainingmore and more attention.In this thesis the following researches were made after observing the current researchstatus around the globe and related technologies.A micro-blogging privacy detection system was proposed in this thesis to detectmicro-blogs involving privacy disclosure. The system mainly contains modules forpre-processing, Chinese word segmentation and results optimization process, stop wordremoval and a double level Na ve Bayesian classifier. Firstly, as the traditional RMM+TSDsegmentation method has too much invalid terms lookup and can not handle ambiguitysegmentation and new word recognition, an I-RMM+I-SD segmentation method wasproposed to solve those problems. The method can effectively improve the segmentationspeed without bringing too much additional dictionary storage expenses, and can handle thecommon two word overlapping ambiguity and new word recognition problem, thus caneffectively improve the efficiency and accuracy of the segmentation. Secondly, a doublelevel Na ve Bayesian classifier was proposed to classify the micro-blogs after thesegmentation process. By such means both micro-blog and privacy classification can beobtained with only one step of marking the micro-blog and privacy category. Combine boththe I-RMM+I-SD segmentation and double level Bayesian classifier’s performance themicro-blogging privacy detection system obtained good privacy detection results, and canmeet the requirements for efficiency and accuracy in micro-blogging privacy detection.Finally, this thesis verified the proposed algorithm through experiments, and theexperimental results were compared for analysis, the results showed that the superiority ofthe algorithm, and the direction of further improvement was also discussed.
Keywords/Search Tags:micro-blog, privacy detection, word segmentation, Na ve Bayesian classifier
PDF Full Text Request
Related items