| There are numerous human languages with complex surface forms in the world,among which there are very different linguistic characteristics.However,since language is a tool to transmit information,is there a consistent law in terms of information transmission rate?Based on large-scale corpus of multiple languages,this paper conducted a computational study on the quantitative laws of information transfer rates in different languages.From a macroscopic perspective,the specific values of the overall information transfer rates of different languages are calculated.Based on large-scale text corpus and acoustic materials,we examined 61 languages covering 4.98 billion native speakers.The results show that the information rates of 61 different languages or dialects are all concentrated around 14.15(±2.26)bits/s,with very little difference between languages.Results also show that different languages always encode the same amount of information at the word-level,which indicates that different languages not only have the same information transfer rate,but also have very consistent information encoding strategies at word-level.From a microscopic perspective,the relationship between durations of words and the corresponding information amounts is studied.Based on hundreds of hours of continuous speech data from 11 languages,we conducted statistical analysis and modeling study on the relationship between the durations of millions of words and their information content.The calculation results show that there is an extremely significant linear positive correlation between the durations of words and the amounts of information in different languages,which indicates that speakers will subconsciously control the duration of words according to the amounts of information they carried in the speech flow.To a certain extent,this reveals the relationship between the details of speech implementation and high-level language functions.At the same time,this paper conducts a case study on the durations of Mandarin tones in speech flow.The results show that the durations of Mandarin tones,while difficult to explain from the perspective of acoustics and physiology,can be well explained by the information entropy.Finally,this paper unifies the research conclusions from both macroscopic and microscopic perspectives in the same mathematical form through mathematical methods,and established a unified theoretical framework for the researches on language information rate based on it.In terms of this law,some linguistic problems can be better explained theoretically.This theoretical framework also reveals a close general relationship between low-level speech realizations and high-level information functions in a cross-language way.In addition,this study found that different languages have different information encoding rules and strategies at the levels of phonemes,syllables and words.These rules have certain implications for phonetics research,especially duration research.Finally,this paper briefly discusses the implications of these research results for historical linguistics,second language teaching,natural language processing and evolutionary anthropology. |