Font Size: a A A

Feature Value Extraction Facing Chinese Text

Posted on:2006-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:J ZouFull Text:PDF
GTID:2178360155475168Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of network, text processing is it as means that message put in order , play a more and more important role among our life. Feature value extraction is key technology in text processing, outstanding feature value could really embody the attribute of the text. This paper draw and do systematic studying to the characteristic value of the Chinese text with the aid of such knowledge as semantics of the natural language , fuzzy mathematics , coarse collecting , probability theory ,etc. mainly, and we propose a method of SMFS. Firstly, we do certain improvement to the present feature value weighting method, and propose a method of Chinese text feature value extraction using multiple heuristic rules in this paper. In this method , we consider not only the appearance rates of words but also the semantic information in the text. We think vocabularies are only language units which express the concept, and the synonyms can all be summed up in the same concept, so we define "synonym concept" as the unit of value of the characteristic in the feature value extraction. Thus we solve the problems of the synonym and the multivocal word, and reduce the dimension of the characteristic space to a great extent, get the correct rate of more excellent classification. The one that is worth pointing out is, " synonym concept " here build automatically in the course of training. We do certain summary to the present text categorization method finally. We also provide the result of the comparative test and a email categorization system that use SMFS.
Keywords/Search Tags:Text Processing, Feature Value, Feature value Extraction, Chinese Semantic Analysis, NLP, Pattern-recognition
PDF Full Text Request
Related items