Font Size: a A A

An experiment in automatic indexing with Korean texts: A comparison of syntactico-statistical and manual methods

Posted on:1994-08-13Degree:Ph.DType:Dissertation
University:University of Illinois at Urbana-ChampaignCandidate:Seo, Eun-GyoungFull Text:PDF
GTID:1478390014494955Subject:Library science
Abstract/Summary:
This study was undertaken in order to develop practical automatic indexing techniques suitable for Korean natural language texts. The study had four purposes: to develop an automatic indexing system for Korean texts, to evaluate the efficiency of the automatic indexing system as compared with a manual indexing system, to compare the effectiveness of weighting algorithms, and to investigate the effect of abstract length.;The basic method of this automatic indexing system was to determine the syntactic category of each text word by dictionary look-up, and then to match sequences of category symbols against a dictionary of acceptable patterns. Sequences of text words that matched one of the patterns in the dictionary were extracted as content identifiers. Finally, the system selected highly ranked content identifiers from each document based on statistical (frequency of occurrence) information.;For this experimental study, the Korean text database was constructed manually based on 100 long abstracts and 200 short abstracts covering business subjects. The study assessed how well the set of index terms produced by an automatic indexing technique reflects the major topics described in an indexed document. For the evaluation, a manual index term list was constructed by consultation between two indexers as an external standard to obtain normalized values.;The experimental results showed that the performance of the automatic syntactico-statistical indexing system was comparable to that of other studies which have compared automatic indexing with manual indexing. The WDF system performed better than the IDF system in terms of the ability to present all the correct content identifiers, and the system produced more correct content identifiers in the short abstract group. As a whole, many significant concepts represented in the abstract and recognized by human indexers have been effectively extracted automatically. The extracted concept forms are for the most part comparable to those of manual indexing. Possible enhancements of the automatic syntactico-statistical indexing system are identified which could lead to improved indexing performance.
Keywords/Search Tags:Indexing, Automatic, Manual, Korean, Texts, Syntactico-statistical, Content identifiers
Related items