Automatic language identification for metadata records: Measuring the effectiveness of various approaches

Posted on:2016-01-23

Degree:Ph.D

Type:Dissertation

University:University of North Texas

Candidate:Knudson, Ryan Charles

Full Text:PDF

GTID:1478390017979068

Subject:Information Science

Abstract/Summary:

Automatic language identification has been applied to short texts such as queries in information retrieval, but it has not yet been applied to metadata records. Applying this technology to metadata records, particularly their title elements, would enable creators of metadata records to obtain a value for the language element, which is often left blank due to a lack of linguistic expertise. It would also enable the addition of the language value to existing metadata records that currently lack a language value. Titles lend themselves to the problem of language identification mainly due to their shortness, a factor which increases the difficulty of accurately identifying a language. This study implemented four proven approaches to language identification as well as one open-source approach on a collection of multilingual titles of books and movies. Of the five approaches considered, a reduced N-gram frequency profile and distance measure approach outperformed all others, accurately identifying over 83% of all titles in the collection. Future plans are to offer this technology to curators of digital collections for use.

Keywords/Search Tags:

Language identification, Metadata records

Related items

1	Research On Scientific Metadata Records Reuse
2	The Application Of Metadata In The Overall Process Of Electronic Records Management
3	Comparative Research On Encapsulation Strategies For Electronic Records
4	Historicism Of Western Records Management Theory To Develop The Study Of The Impact
5	Research On Speech Language Identification Based On Deep Learning Network
6	Research On Metadata Management Tool Based On Semantic Analysis
7	Design And Implementation Of Metadata Management Tool Based On Semantic Analysis
8	Metadata Storing And Retrieval-Design And Implementation Of MSR System
9	Study Of A Metadata Standard For Sharing Database On Biological Effect Of Electromagnetic Pulse
10	Semantic Based Scientific Literature Metadata Retrieval System