On Representation Of Text Annotation In Database And Its Application

Posted on:2009-05-19

Degree:Master

Type:Thesis

Country:China

Candidate:Y Song

Full Text:PDF

GTID:2178360308479482

Subject:Computer application technology

Abstract/Summary:

Dictionaries are universally regarded as a kind of repository which contains a lot of knowledge of linguistics and common sense. A lemma in a conventional English dictionary is usually composed of the forms of the word, the parts of speech, the senses, the example sentences, the idioms, the syntactic and semantic specifications and the comments of derivation and domain. Being extracted from the conventional dictionaries and stored in the structured form, the knowledge in the dictionaries could be used and processed conveniently by computers, which is beneficial to the research on the relative fields of linguistics, Natural Language Processing (NLP), machine translating and knowledge engineering and is also beneficial to the language teaching.The existing conventional dictionaries in paper mould are built for human beings but not computers. They are usually stored in text formatting. Although there are some regulations for typesetting, many unbending structures and entities are still appeared, because the target readers are human beings. The boundaries of many parts of the lemma are unapparent, so it is very difficult for computers to parse them.The information extraction from the dictionaries lies on the identification and annotation of the information entities in the dictionary text. The representation method of text annotation in database is presented in this thesis. This method makes the features of the information entities and the annotation results stored in database so that all the information of the features should be parameterized. The basic annotation method is to identify and annotate the entities by considering the relationships between the entities, the characteristic marks of the entities and their combinations. The generality of the annotation system is improved by this method. It is helpful to build an annotation system conveniently for another dictionary's text which has a similar structure.Firstly, the concepts and their relationships in the field of text annotation are analyzed in this thesis. After that, the general representation method in database is proposed to text annotation.Next, the representation of text annotation in database is applied to the annotation program of the Oxford Advanced Learner's English-Chinese Dictionary Fourth Edition (OALD4) text. The analysis, design and implementation of the annotation and information extraction system for OALD4 based on the representation of text annotation in database are presented in detail. Finally, the conclusion and the future works are discussed. The further research works and the suggestions for improvement are proposed.

Keywords/Search Tags:

Text annotation, information extraction, repository construction, English-Chinese Dictionary

Related items

1	Research On Automatic Annotation For Chinese Text And Its Application
2	The Design And Implementation Of Multilingual Mongolian-Chinese-English Dictionary Resource Management Platform
3	The Implementation Of The Chinese Information Extraction System Based On GATE
4	Research On Web-based Chinese-English Bilingual Dictionary Generation
5	Development And Design Of Uyghur-Chinese-English Language Learning System And Electronic Dictionary
6	Establishing English-Mongolian-Chinese Electronic Dictionary Based On Tree Structure And Researching The Encryption Algorithm
7	The Design And Realization Of English To Chinese And Mongolian Electronic Dictionary Computer Inquiring Software
8	Design And Implementation Of Tibetan Chinese English Trilingual Electronic Dictionary Based On Android
9	Research On Text Watermarking For Texts Mixed Chinese And English
10	Research On The Construction Of Present Situation And Countermeasures Of The Institution Repository In Chinese Universities