Font Size: a A A

Research Of Chinese Stentence Skeleton Parsing Base On Statistical Model

Posted on:2009-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:W T TanFull Text:PDF
GTID:2178360278956770Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the quantity of information rises also rapidly.What Parsing in Natrual Language Processing(NLP) study is not just only how to get the struct of the syntactic but also how to realize effective information organization and management and accurate information query which is a difficult problem for information technology.Skeleton Parsing is a key problem in NLP.The main task of Skeleton Parsing is to identify the skeleton of a sentence automatically.Chinese Skeleton Parsing has significant meaning in many domains in Natrual Language Processing like Automatic Translation, Information Extraction , Automatic Summary and so on.In this thesis we first present some backgrounds of Chinese Skeleton Parsing.Then we discuss the semantic and syntax meaning of Chinese Skeleton Parsing,and we also point out the difficulty of this problem.After Analysing the syntex character of the skeleton of the Chinese sentences,we introduce the Maximum Entropy(ME) which is a current statistical model especially in NLP for Chinese Skeleton Parsing .ME is a model which is simple, reusable and can select the lingual features neatly.The theory of ME framework is independent of any particular natural language task.We select the very features for ME by analysis the semantic and syntax features of the skeleton.Beacause of the interrelation of the skeleton in the same context,this thesis present a Muti-layer Maximum Entropy Model for the skeleton parsing.In this model the low-layer ME parse skeleton by the context features while the high-layer ME parse skeleton by both the result of the low-layer ME and the features between sentences.Before parsing by Muti-layer ME,we segment the Chinese sentence and label the part-of-speech of the word, identify phrase which may be the skeleton of the sentence.we also classify the sentence by sentence and clause sentence.After these processes wo can get the candidate set of the skeleton.For the lack of the corpus which label the skeleton,we present a smooth method based on the context similarity between words ME.We get similarity by comparing the context of the words,and then smooth the word which not present in corpus by the most similar words to advance the performance of model under a small corpus.The results show that our method is efficient for Chinese Skeleton Parsing.We achieve very high precision under a small corpus.We can do Chunk Parsing base on skeleton parsing,and we can also use the smooth method based on the context similarity in other statistical language model just like Hidden Markov Model.
Keywords/Search Tags:Shallow Parsing, Skeleton Parsing, Muti-layer Maximum Entropy, Context Similarity
PDF Full Text Request
Related items