Font Size: a A A

Research On Automatic Indexing System Of Economic News

Posted on:2001-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:G T ChaFull Text:PDF
GTID:2168360002452550Subject:Agricultural economics and management
Abstract/Summary:PDF Full Text Request
News documents is one of the most important information to the society in order to make them available, the automatic indexing is a necessary process. The paper first gives a brief introduction of automatic indexing of documents, the characteristic of news and construction of news database. In order to realize the news?automatic indexing in chinese, the thesis proposes a technique of automatic words-extracting based on multiple vocabularies instead of automatic words-segmenting, these vocabularies are as follow: special words lists, stop-lists, synonymous dictionary, keywords lists, thesaurus, concordance of keyword string vs class number dictionary and word elements dictionary etc. , their function, construction and application are be detailed, and some rules and algorithms of automatic words-extracting are be designed. To accomplish subject indexing, the author develop the algorithm of word form likeness and combined it with the measurement of word elements similarity which based on the word elements dictionary, and to complement the subject headings by adding free words. Instead the classifying using a chinese character, the improved automatic classifying depends on matched algorithm of keywords cluster which based on lots of experiential classifying data. Whereafter, the author use Turbo C and VFP 6.0 to design an automatic indexing system which is used to process full-text of economic news of Xinhua Agency. The experimental system includes data input, automatic words- extracting, subject indexing, classifying and the words-extracting dictionaries maintenance. And then, the paper evaluates the system in some aspects by analysing the result of 100 news indexing experiment and comparing with manual indexing. These experiments show that it is a feasible approach to indexing news automatically with related vocabularies of natural language and controlled vocabularies, to identify synonyms with words elements similarity combined with words fcrm likeness, and to classify documents with keywords cluster.
Keywords/Search Tags:news database, news document, automatic indexing, automatic classifying, words extracting system, vocabulary construction, software design
PDF Full Text Request
Related items