Font Size: a A A

The Design And Implementation Of The Networking Resource Collecting System

Posted on:2006-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:X C MengFull Text:PDF
GTID:2168360155952962Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The NERMS (short for Network Educational Resource Management System) is one of the research projects in the Science and Technology Development Planning of Jilin Province. The main target of NERMS is to organize and manage the various network educational resources, to easy-share and easy-obtain the educational resources, so as to quicken the development of the educational resources. At present, the development of NERMS has been almost completed, but the number of resources is very small. If we just do the work one by one manually, we will find it's a tedious job and we must find a new tool to help us, so the resource collecting tool was developed. We have implemented a resource collecting system which is nature language friendly and configuration based, and the obtained resources can be used by NERMS directly. Keywords extraction from the nature language text and information extraction from the semi-structured text are the key issues. We implement the Chinese words segmentation and tagging method based on the statistical model, and then we try to present a new method to recognize the un-registered proper nouns: treat the nouns from the pos tagging as the candidate proper nouns; probe to its two sides, each probe generates a new candidate, calculate the occurrence frequency of all the candidates, pick the most suitable proper noun according to the rule. We have also implemented the wrapper by J. Hammer, to extract information from the search engines'search results. This system includes two main modules. The target of the Keywords Extraction module is to extract several sequences of keywords from a natural language document. Generally speaking, a proper noun contains at least one noun. Thus, the work becomes to finding all the nouns and then probing to its two sides. As to the multi-lexical category word, its lexical category is determined by its context. This article applies lexical category tagging (POS tagging) algorithm based on the statistical model. Also, the premise of POS tagging is Chinese word segmentation. This article applies Chinese word segmentation algorithm based on the statistical mode, too. The next step is to...
Keywords/Search Tags:Implementation
PDF Full Text Request
Related items