Font Size: a A A

Chinese New Term Detection Method Based On Homogeneous Markov Chain Research

Posted on:2013-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:J K HuFull Text:PDF
GTID:2248330374475328Subject:Software and theory
Abstract/Summary:PDF Full Text Request
The terminology is the most important unit of knowledge in specific areas, mostlycontains a wealth of background knowledge. From different areas of technology developmentunder the new situation, the specialized vocabulary of the different sources of information israpidly emerging from the flood of information, quick access to specific areas of the newterms, the grasp of specific areas of knowledge and grasp of the latest developments of greatimportance, In addition, new terminology recognition is an important research direction inknowledge engineering, the effect of a direct impact on the performance and quality of theword, knowledge and recognition, an important practicality.Existing new word detection and identification methods based on statistical orgrammatical rules, ignoring the understanding and grasp of the word internal developmentcharacteristics. This article is from the vocabulary development of rules and characteristics ofthe study, vocabulary development is divided into a number of different stages, analysis of thecharacteristics and metastatic pattern of the various stages, the proposed word developmentlife cycle theory, the theory that the vocabulary in stage of development will have a differentvariation. On the basis of this theory, combined with knowledge of the Markov chain, raised anew specific areas of new terminology recognition, the core of this method is to build a newterminology recognition model. Qualitative analysis of new terminology to identify the natureof the model-the irreducible homogeneous Markov chain, followed by detailed descriptionof the model build steps, specifically: term development trend of the reference frequency rateof change into different states, new statistical terminology the state transition sequence, andcalculate the state transition probability matrix, and each state, the initial probabilitydistribution, the two came together to form the new terminology recognition model. Inaddition, the paper also experiment contrast the state transition matrix and stationarydistribution of the old and the new terminology proposed to characterize the distribution ofthe state of development of the new terms with stationary distribution.Judgment model applied to the new terms, this paper proposes a window-frame movingaverage method for the calculation model to treat the sequence of measured term state support, to judge by the size of the support test terminology of the new terminology in the field. Theexperimental results show that the model with high accuracy and usefulness of thediscriminant of the new terms.
Keywords/Search Tags:natural language processing, the identification of new terms, time series, Markov chain, word life cycle
PDF Full Text Request
Related items