Font Size: a A A

On Representation Of Text Annotation In Database And Its Application

Posted on:2009-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y SongFull Text:PDF
GTID:2178360308479482Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Dictionaries are universally regarded as a kind of repository which contains a lot of knowledge of linguistics and common sense. A lemma in a conventional English dictionary is usually composed of the forms of the word, the parts of speech, the senses, the example sentences, the idioms, the syntactic and semantic specifications and the comments of derivation and domain. Being extracted from the conventional dictionaries and stored in the structured form, the knowledge in the dictionaries could be used and processed conveniently by computers, which is beneficial to the research on the relative fields of linguistics, Natural Language Processing (NLP), machine translating and knowledge engineering and is also beneficial to the language teaching.The existing conventional dictionaries in paper mould are built for human beings but not computers. They are usually stored in text formatting. Although there are some regulations for typesetting, many unbending structures and entities are still appeared, because the target readers are human beings. The boundaries of many parts of the lemma are unapparent, so it is very difficult for computers to parse them.The information extraction from the dictionaries lies on the identification and annotation of the information entities in the dictionary text. The representation method of text annotation in database is presented in this thesis. This method makes the features of the information entities and the annotation results stored in database so that all the information of the features should be parameterized. The basic annotation method is to identify and annotate the entities by considering the relationships between the entities, the characteristic marks of the entities and their combinations. The generality of the annotation system is improved by this method. It is helpful to build an annotation system conveniently for another dictionary's text which has a similar structure.Firstly, the concepts and their relationships in the field of text annotation are analyzed in this thesis. After that, the general representation method in database is proposed to text annotation.Next, the representation of text annotation in database is applied to the annotation program of the Oxford Advanced Learner's English-Chinese Dictionary Fourth Edition (OALD4) text. The analysis, design and implementation of the annotation and information extraction system for OALD4 based on the representation of text annotation in database are presented in detail. Finally, the conclusion and the future works are discussed. The further research works and the suggestions for improvement are proposed.
Keywords/Search Tags:Text annotation, information extraction, repository construction, English-Chinese Dictionary
PDF Full Text Request
Related items