Text Style Analysis For We Chat Articles

Posted on:2021-05-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y W Guo

Full Text:PDF

GTID:2518306512988639

Subject:Books intelligence

Abstract/Summary:

PDF Full Text Request

The information disseminated through the new media provides a huge amount of information sources and great convenience for people’s daily life,social and economic activities,corporate organizational decisions and government public management.However,the problem of information noise,including false information and exaggerated information,is particularly prominent in the Internet and new media era.Among them,as the most influential mobile social network platform in China,We Chat and the public account running on We Chat have become important traffic positions in the new media era.Many public accounts have adopted a large number of exaggerated writing styles in order to enhance the stickiness of readers,which has caused social impacts.Therefore,based on linguistic theories,this study proposes a multi-dimensional measurement system based on text elements for text style,and uses We Chat public account articles as samples to design and test text style detection schemes.The research is mainly divided into three levels:(1)Theoretical level.This paper focuses on linguistic related theories,integrating different perspectives of the subject,analysing the elements of text style,constructing a multi-dimensional text style measurement system,and combining specific corpus examples for individual cases analysis.(2)Methodological level.Depend on the overall process design of text style measurement,we design the prior knowledge base on the word vector construction method combining text style dictionary(TSD)and Word2 Vec,and Bi-LSTM-based text style detection model.(3)Application level.Firstly,based on the automatic collection of We Chat articles,this paper develops a data annotation platform through jfinal and Vue.js.This research annotate the collected corpus(small corpus)articles depending on the multidimensional text style.At the same time,we collect Encyclopedia Q&A data as a large-scale general corpus(large corpus).Secondly,combine existing dictionary resources,supplemented by manual correction to construct a text style dictionary(TSD).Then,as an experiment in this research,we focus on the implementation of Word2 Vec word vector construction based on small and large corpora Comparative analysis.This paper conducts a comprehensive comparative analysis of different multi-objective classification experiments,and verifies and evaluates the designed text style detection process and method.The Bi-LSTM model for text style detection is optimized through experiments.The experiments are: the traditional classification method SVM,small corpus Word2 Vec + Bi-LSTM,small corpus and large corpus Word2 Vec + BiLSTM,TSD + small corpus Word2 Vec + Bi-LSTM and TSD + Word2 Vec combined with small and large corpora.The conclusions show that:(1)Theoretically,the multi-dimensional text style measurement system constructed by this research has a certain degree of interpretability and understandability.(2)The deep learning model represented by Bi-LSTM can effectively treat text style Multi-objective classification shows better performance than traditional machine learning methods.(3)Compared with independently using We Chat public account article corpora,by integrating larger-scale general corpora,Word2 Vec word embedding construction,and Bi-LSTM text,the effect of style classification is further improved.(4)Compared with the independent Word2 Vec word embedding construction,by constructing a prior knowledge base-text style dictionary(TSD)and stitching TSD and Word2 Vec word vectors,Bi-LSTM-based text style classification is significantly improved.

Keywords/Search Tags:

We Chat public account, Text style detection, Natural language processing, Machine learning, Word embedding

PDF Full Text Request

Related items

1	Research On Machine Learning For Natural Language Processing And Transmission
2	Improvement And Application Of Text Classification Based On RNN
3	Research On Text Classification Based On Natural Language Processing And Machine Learning
4	Research On Jointly Learning Word Embeddings And Latent Topics In Text
5	Research On E-Commerce Commodity Title Category Classification Algorithm Based On Natural Language Processing Technology
6	Sentence Vectorization Modeling And Text Level Application
7	Deep Contextual Word Embedding In Natural Language Processing
8	Research On Text Style Transfer Based On Delete-Retrieve-Generate Framework
9	Unsupervised Extractive Text Summarization Using Sentence Embedding
10	Exploration On Sense Embedding Model Based On Gaussian Distribution