| With the development of the Internet and the further progress of informatization in all walks of life,obtaining complex semantic information from language use has become a current research priority.Official news and published fiction(hereinafter referred to as news and fiction)are widely used by scholars in a variety of fields because of their normative and highly recognized nature,and their relevance to social change and development in public opinion.The increase in the volume of data brought about by the digitalization process has brought about a more complete data base for the analysis of language use in news and fiction,but it has also posed new challenges for the analysis of natural language use in news and fiction: on the one hand,the models for classifying news and fiction are less interpretable and the analysis of the differences in language use between the two is not comprehensive;on the other hand,automated computer-aided analysis methods for the analysis of language use over long periods of time are not yet complete,and cannot accurately mine the language.On the other hand,automated computer-aided analysis methods for long-duration language use analysis are not sufficiently developed to accurately extract the information behind them.Based on the background,this thesis presents a study of news and fiction from two dimensions of language use analysis.(1)The static dimension: the introduction of syntactic structure improves the accuracy of classifying news and fiction and improve the lack of analysis of hierarchical features in language use analysis.(2)The dynamic dimension: a frequency and semantic-based analysis of news lexical evolution is proposed,revealing the connection between lexical evolution and social change in news.The static dimension analysis mainly includes two aspects: a syntactic structurebased classification model of news and fiction,and an analysis of differences in the use of natural language in news and fiction.In the first part,the inclusion of syntactic structure features in the classification model of two registers increases the classification effect by 7.21%,7.45% and 8.59%,demonstrating the effectiveness and importance of syntactic structure in portraying the differences in language use of two registers.In the second part,unlike existing comparative linguistic analysis methods,this thesis applies a rule-based natural language understanding approach and a phrase structure grammar,and the inclusion of syntactic structure as a linguistic feature not only enriches the feature approach to the analysis of language use differences in news and fiction,but also improves the interpretability of register classification models based on natural language understanding.This thesis not only proposes a more interpretable and effective classification method for register,but also proposes a method for analyzing the differences in language use between registers,which can further improve the lack of analysis of hierarchical features in linguistics and communication science.In terms of dynamic dimensional analysis,this thesis proposes a frequency-based and semantic-based lexical evolution analysis method based on news texts,which mainly includes two parts: frequency-based analysis of key words over time and semantic-based analysis of time-sensitive words.In the first part,from the perspective of word frequency,this thesis adopts the TF-IDF method to depict the trend of changes in the importance of key words in the news over time and reveals the reasons for this in relation to historical events;the second part is divided into two stages: the screening of domain time-sensitive words,and the semantic evolution analysis of selected domain time-sensitive words.Based on the idea that changes in the semantics of individual words can also reflect changes in the contextual environment,this thesis proposes a domain time-sensitive word screening method based on dynamic word embedding and TF-IDF to identify time-sensitive words from the domain significant words over time;finally,focusing on the economic domain and the political domain,a domain timesensitive word is screened out respectively,and this thesis explains the relationship between lexical evolution and social development in the light of historical events This thesis explains the relationship between lexical evolution and social development in relation to historical events,which,on the one hand,confirms that lexical meanings and social development are synchronous,and on the other hand,demonstrates that the analysis of changes in lexical meanings based on ephemeral texts is an important support for the study of the development history of various aspects of society.On the other hand,it shows that the analysis of changes in the meaning of words based on ephemeral texts is an important support for the study of the development of various aspects of society.In summary,the method of analyzing the evolution of words based on frequency and semantics proposed in this thesis provides a new way of thinking about the identification of time-sensitive words in the field,and the analysis of trends in social change and lexical evolution.In summary,the main innovations of this thesis are as follows.(1)Firstly,the syntactic structure is added to the classification of news and fiction,which greatly enhances the classification effect.Secondly,the difference analysis of two registers with syntactic structure as the center not only improves the interpretability of the register classification model,but also makes up for the lack of hierarchical structural features in the difference analysis of registers.(2)Based on frequency and semantics,a domainoriented time-sensitive word screening method is processed to efficiently obtain highquality candidate words using human-computer fusion.Based on the semantics of the dynamic word vector,the method is also used to locate the time when the context of the domain time-sensitive words changed significantly,and to assist researchers in analyzing the deeper information of social changes behind their semantic evolution. |