Font Size: a A A

Identification And Application Of Coordinate Structure For Chinese Patent Literature

Posted on:2015-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:C ShiFull Text:PDF
GTID:2298330467467071Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Patent literature is an important kind of technical data, whose text format is relatively fixedand term is more standardized. And some high frequency words, unlisted words and a largenumber of coordinate structures are also used in it. The identification of coordinate structurefor Chinese patent documents can improve the performance of patent literature analysis.Meanwhile, the results of identification can be applied to machine translation, informationextraction, etc. Previous studies on coordinate structures are some theoretical discussion andthe identification of coordinate structures for non-patent literature. Chinese patent corpus isused in this paper for analysis and identification of Chinese coordinate structures.Firstly, the linguistic features of Coordination with Overt Conjunctions (COC) are countedand analyzed in Chinese patent literature, including the internal and external features. Theinternal features are mainly studied on coordination tag, internal analysis of coordinationstructure and the distribution of Part-Of-Speech (POS). The external features are mainlystudied on counting for the potential boundary markers and analysis for the dependencysyntactic features of Coordination with Overt Conjunctions (COC) in Chinese patentliterature.Secondly, this paper studies on the identification to the Coordination with OvertConjunctions (COC) in Chinese patent literature. Based on the results of the identification, italso studies on the identification to non-nested and nested coordinate structure. Rulepre-processing and post-processing are conducted to the identification results, which uses thestatistical analysis rule as identification rules in the identification process. By the ruleprocessing, the identification accuracy will be raised.Finally, according to the dependency characteristic of the coordinate structures, betteridentification results of non-nested coordinate structure is selected to conduct post-processesto the dependency analysis results of Chinese patent literature in this paper. By the ruleprocessing, the analysis results will be raised.
Keywords/Search Tags:Chinese Patent Literature, Coordinate Structure, Coordinate Structure Analysis, Coordinate Structure Identification, Conditional Random Fields, Coordinate StructureApplication
PDF Full Text Request
Related items