Font Size: a A A

Research Of Chinese Semantic Chunk Recoginition

Posted on:2016-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:R Y ChangFull Text:PDF
GTID:2308330467974742Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the acceleration of informatization progress and the rapid development ofInternet nowadays, natural language processing technology is widely used in manyinformation processing fields such as machine translation, information retrieval,human-computer interaction etc. After years of development, natural languageprocessing gradually achieved transition from rule-based approach to statistic-basedapproach. The rule-based approach aims at deeply analyzing and understandingnatural language, which is complicated and difficult to achieve; while statistic-basedapproach aims at superficially processing natural language, which is easy to achievewith the computer.Semantic chunk analysis technology represents shallow semantic analysis andsyntactic analysis, which aims at explaining the relevance between syntax andsemantics. The length of chunk lies between sentence and word, and the chunk isdivided differently in various natural languages. The thesis mainly researches Chinesesemantic chunk recognition.There isn’t any unified description system on Chinese chunk analysis. Fordifferent purposes, different researchers proposed different chunk analysis systemrespectively. Chunk analysis is a kind of shallow parsing technology. Aiming atanalyzing Chinese sentence syntax and semantics synthetically, the thesis applies thedefinition of semantic chunk in the matter of remarking the related task semantic roleto chunk analysis, and deeply researches the key technology during semantic chunkrecognition.The semantic chunk analysis research is an important direction in shallowsemantic parsing domain. To improve the precision of semantic chunk recognition, anew IO labeling method was proposed in this paper. The IO labeling has only twotags, which was combined with the advantages of SVM for improving the recall rateand F1value in Chinese semantic chunk recognition. At the same time, this paperpresented the research results on semantic chunks’ serial labeling using I or O tagsby applying CRF model. The experimental on Chinese Proposition Bank results showthat our proposed combined IO labeling and SVM method can obtain bestperformance and reach80.3%F1value in Chinese semantic chunk recognition. The experiments also show that the different labeling methods affect the performance ofsemantic chunk recognition.In the thesis, the specific research contents are as follows: First, the process andevaluation method of recognizing semantic chunk is introduced, from which we cansee that the various components in the sentence are labeled differently after semanticchunk recognition, which represent whether these components are semantic chunkor not. The thesis proposes a new labeling method which is applied to semantic chunkrecognization and compared with traditional labeling method. Second, the statisticalmachine learning methods such as CRF (Conditional Random Field) and SVM(Support Vector Machine) are applied to build statistical models combing with IOlabeling method. Then semantic chunk recognition is researched as sequence labelingand binary classification respectively. Comparing the results of the above twomethods, we find that, in the matter of semantic recognition, SVM-based semanticchunk recognition can get the best result with IO labeling method. Finally, the newsemantic resources join the current system, and the semantic chunk is researched froma new perspective.
Keywords/Search Tags:Semantic Chunk Analysis, Machine Learning, CRF, SVM, Semantic RoleLabeling
PDF Full Text Request
Related items