This article mainly discusses how to segment the character string of"buXshi"in Chinese automatic word segmentation,and try to design algorithm and test programming.The first chapter reports the disposing and processing of our corpus. Determining the selection standard of"bu X shi"and extracting them,classifying them into three classes as :"bu/X shi/","bu X/shi/","bu/X/shi/".The second and third chapter are the analysis of"buX/shi/"and"bu/X/shi/", mainly describes and explains them one by one, analyze their syntax environment and restricting factors. First describes the string on the same level, and then tries to point out the possible exceptions on the difference level. Also summarizes the syntax environments about the whole chapter which serviced for algorithm design.The forth chapter is analysis of various segmentation of"buXshi"character string, mainly describes and explains those have more than two segment ways. Try to analyses their syntax environment and restricting factors in different segment character string s.The fifth chapter is summary, design and test. Summarized the second to forth chapter, and try to design algorithm and test programming based on the conclusion.At the end, reported the testing results, meanwhile summarized the whole article and prospect. |