| With the development of society, advances in technology, people can get information by more ways and more convenient, meantime, large amounts of data generate in the communication. Automation and intelligent information processing is an inevitable social development. In this context, natural language processing has been the rapid development. In the area of Chinese information processing, Chinese segment problem and POS tagging problem have been well solved, and some softs can been used in the practical applications. However, we must achieve understanding of sentences before comprehending chapters. The study of Chinese Compound sentences is a bridge between the study of sentences and the study of chapters.Actually, the compound sentence is constituted by the clauses, and it contains much more information than the single clause. The compound sentence is often used to express the logical relationship between people and people, people and things, people and matters. Simultaneously, it has many attributes about syntax, semantics and even pragmatics. Dividing the hierarchical relationships of the compound sentence is a fundamental research, and the problems of relation tags tagging and the collocation of relation tags tagging must been solved before the fundamental research. Based on the above facts, it’s necessary to stand on the level of the syntax, semantics and even pragmatics about the compound sentence.This paper attempts to achieve some understanding about the hierarchical relationships of the compound sentence based on the relation tags tagging. Studying the features of the compound sentence is the most fundamental research. This paper discusses how to automatically select the syntactic features of the compound sentence based on the dependency tree and get the syntactic features which express the relation tags and the collocations between them. When we select features from the sentence, both lexical features and syntax features included.Conditional Random Fields (CRFs) are undirected graphical models. CRFs can include a wide variety of non-dependent features of the sentence. This model has been widely used in many NLP problems. In this paper, we use CRFs to train the corpus of the compound sentences, and embed the feature selection algorithm into the model to achieve selecting syntactic features automatically. The experiments are divided to two parts:relation tags tagging and the collocations of relation tags tagging. The result of relation tags tagging is better because of more research and its simplicity. Its precision and recall are about98%. This paper only expresses a little about the collocations of relation tags, and its precision and recall are about77%, we need more study at this task. The model files which we get from the experiments can been well used in the related tasks. |