An experiment in automatic indexing with Korean texts: A comparison of syntactico-statistical and manual methods

Posted on:1994-08-13

Degree:Ph.D

Type:Dissertation

University:University of Illinois at Urbana-Champaign

Candidate:Seo, Eun-Gyoung

Full Text:PDF

GTID:1478390014494955

Subject:Library science

Abstract/Summary:

This study was undertaken in order to develop practical automatic indexing techniques suitable for Korean natural language texts. The study had four purposes: to develop an automatic indexing system for Korean texts, to evaluate the efficiency of the automatic indexing system as compared with a manual indexing system, to compare the effectiveness of weighting algorithms, and to investigate the effect of abstract length.;The basic method of this automatic indexing system was to determine the syntactic category of each text word by dictionary look-up, and then to match sequences of category symbols against a dictionary of acceptable patterns. Sequences of text words that matched one of the patterns in the dictionary were extracted as content identifiers. Finally, the system selected highly ranked content identifiers from each document based on statistical (frequency of occurrence) information.;For this experimental study, the Korean text database was constructed manually based on 100 long abstracts and 200 short abstracts covering business subjects. The study assessed how well the set of index terms produced by an automatic indexing technique reflects the major topics described in an indexed document. For the evaluation, a manual index term list was constructed by consultation between two indexers as an external standard to obtain normalized values.;The experimental results showed that the performance of the automatic syntactico-statistical indexing system was comparable to that of other studies which have compared automatic indexing with manual indexing. The WDF system performed better than the IDF system in terms of the ability to present all the correct content identifiers, and the system produced more correct content identifiers in the short abstract group. As a whole, many significant concepts represented in the abstract and recognized by human indexers have been effectively extracted automatically. The extracted concept forms are for the most part comparable to those of manual indexing. Possible enhancements of the automatic syntactico-statistical indexing system are identified which could lead to improved indexing performance.

Keywords/Search Tags:

Indexing, Automatic, Manual, Korean, Texts, Syntactico-statistical, Content identifiers

Related items

1	Research And Implementation On Automatic Indexing Method Of Texts
2	Study On The Theory & Practice Of Automatic Indexing Of WWW Science And Technology Information Resources
3	Research Of Automatic Indexing In Economic Bibliographical Database
4	A framework for indexing higher-level content in natural images: A study on the far side of the semantic gap
5	A Research To CRF-based Automatic Subject Indexing For Chinese Books
6	Content-based video analysis, indexing and representation using multimodal information
7	The Effect Of Korean Popular Dramas For Korean And Chinese University Students
8	Manual and computer-based stereology: The tradeoffs and automatic target recognition using feature-level fusion
9	The Research On Indexing And Optimization Technology In Content-based Image Retrieval
10	Research On Automatic Indexing System Of Economic News