Font Size: a A A

Research On Multi-modal Data Processing Methods Of Network Public Opinion Involving Unregistered Words

Posted on:2021-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:K Y LiuFull Text:PDF
GTID:2438330611968489Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
The public opinion information produced in the network community includes a large number of texts,images,videos and other modal data,and the communication mode of those data has the characteristics of randomness,rapidity,convenience,etc.The network community is a gathering place for social public opinion,in which the processing of unknown words is an important link affecting the analysis of online public opinion.This thesis analyzes the public opinion involving unknown words in online images,and carries out the research from many angles such as character recognition,text processing,and public opinion analysis.The main research work and related conclusion of this thesis are as follows:1.The thesis analyzes the characteristics of online images involving unknown words,and explores the relationship among online images,online public opinion and the unknown words.The relevant data is collected through web crawling and manual marking,and the text data in online images are effectively obtained by the use of the Ocr image-character recognition algorithm combined with word segmentation technology.2.This thesis proposes a two-way directional replacement model,which uses two input layers for replacement processing according to the characteristics of unknown words in the text: one is a synonym replacement list based on Word2 ve semantic analysis,and the other is a replacement list of key words extracted from Text Rank.Based on the TF-IDF weighted naive Bayes classifier,this model is also combined with time series.Experiments indicate that,in the events of public opinion involving unknown words,this model plays a better role in classification than the traditional method does,and it can accurately identify the dynamic transformation of unknown words in online public opinion.Besides,this model can not only classify the public opinion involving unknown words,but also adapt to the transformation of unknown words contained in public opinion based on time series.3.Being not limited to the processing of traditional text data,this thesis starts from the source online images and network text's public opinion information,and carries out the multi-modal data research from the aspects of text,image and the combination of text and image.Experiments are carried out on synthetic data sets and public data sets,and the experimental application of mixed online images and texts is further constructed.Combining the characteristics of part of speech,category,cohesion,time series and word frequency,etc.,of unknown words,the online public opinion involving unknown words is effectively analyzed and determined,and the validity of the model constructed in this thesis is further proved.
Keywords/Search Tags:unknown words, online public opinion, image character recognition, text processing, multi-modal data
PDF Full Text Request
Related items