In recent years,with the continuous updating of Internet technology,text information data has shown an explosive growth trend,which has greatly promoted the development of natural language processing.As a common language phenomenon,the omission of Chinese expression has attracted more and more attention of scholars.The task of Chinese zero elements detection has become an indispensable branch of research in the field of Natural Language Processing.In the previous approaches to Chinese zero elements detection,zeros are considered as a part of the syntactic structure,detected by lexical and syntactic information in the sentence,but the semantic information has been ignored.With the development of Chinese discourse parsing,the language laws within the texts reflect a broader contextual semantic relationship,which provides a new direction for Chinese zero elements detection research.This paper focuses on conducting a series of researches on Chinese zero elements detection from discourse perspective.There are three main research contents in this paper as follows.(1)This paper studies the construction of Chinese discourse zero corpus.Through the analysis of the existing corpus resources,the concept of Chinese zero elements is proposed,and a zero elements corpus of Chinese is constructed.We annotate the omission of language components and the corresponding reference items in the context from the semantic level,which provides corpus resources for Chinese zero element research.(2)Then,Chinese zero elements detection process is divided into two stages,Chinese zero candidates extraction and classification,which construct a Chinese zero detection platform.Combining different lexical and syntactic features,we conduct a serious of experiments to explore the limitations of syntactic information on Chinese zero elements detection task.(3)In order to improve the effect and reduce the reliance on syntactic analysis in the process,we proposed a discourse level Chinese zero elements detection approach based on Chinese discourse structure analysis.Experiments show that the method is suitable for both candidate extraction and classification,which benefits the performance of process.Besides,the discourse information effectively reduces the reliance on syntactic analysis in the detection process. |