Font Size: a A A

Study Of Chinese Word POI Segmentation System Based On N-Shortest-Paths And HMM

Posted on:2009-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:X TangFull Text:PDF
GTID:2178360242996683Subject:Cartography and Geographic Information System
Abstract/Summary:PDF Full Text Request
Chinese auto-segmentation is the foundation of NIP(Natural Language Proceessing),and also is a basic task in the area of Chinese NLP.The author analyses and compares the various ways of nowdays Chinese auto-segmentation theoretically,and the technical features of the various ways of Chinese auto-segmentation are described.The main objective of this research is to design and implement a Chinese POI auto-segmentation system.After an analysis of the main difficulties,this research has designed and realized a Chinese POI auto-segmentatiion system based on a multi-step process strategy.First,this paper introduces the main difficulties and the algorithm of Chinese auto-segmentation system,and analyses the root of the ambiguity.Ambiguity' recognition methods are introduced.Second,this paper gathers,coordinates and establishes natural language recource the study needed,which mainly includes corpus's gathering and dictionary' building.The paper has designed and realized a Chinese POI auto-segmentation system based on a multi-step processing strategy.The system includes some modules such as originally segmenting, ambiguity processing and Unknown Word Recognizing.Original segmenting is to find out the potential routes in sentences,which is based on N-Shortest-Paths.According to their own features of POI,ambiguity processing refers to eliminating ambiguities using rule and HMM.The rule method is used to realize the function of the Unknown Word Recognizing.Last,the paper validates the system's performance by experimentation.At the same time ,the paper summarizes all the work and gives suggestion for the future researches.
Keywords/Search Tags:Chinese auto-segmentation, POI, Corpus, HMM, Crossing Ambiguities
PDF Full Text Request
Related items