Font Size: a A A

Research On Arabic Named Entity Recognition Using Hybrid Models

Posted on:2014-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:S H u s a m e l d d i n A . Full Text:PDF
GTID:2298330452461204Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Named Entity Recognition (NER) is one of the information extraction tasks thatfocus on recognize and classify named entities from unstructured text such as thenames of persons, organizations, locations, etc. Most of researchers uses machinelearning to deal with NER tasks, while few researchers uses handcrafted rules. Ourresearch is focus on NER for the Arabic language, which is an important languagewith many challenges. Named entities are the very important information in NaturalLanguage Processing especially in information retrieval, question answeringsystems, text classifications, text-summarization and information extraction. Arabiclanguage is the official language of Arab world and morphologically, syntacticallyand phonologically based on Classical Arabic. Arabic language now is the sixthwidely spoken language in the world and it is the mother tongue of300million ofpeople.Arabic Named Entity Recognition is still in the basic stages and there are nomany researches were done comparing with English language so we choose thistopic to enhance the quality of Arabic NER. We are focusing on the names ofpersons, organizations, locations. In this research we proposes a simple combinationof Rule-based with machine learning method as a hybrid method for Arabic namedentity recognition, which we have done by employing key words and special verbsas triggers to tag the named entities and use it as features for machine learningmodel.Arabic language has many challenges, some of them are the lack ofcapitalization, highly inflectional, morphological ambiguity and the character takesthree forms depending on its position and the lack of resources. Our proposedRule-based system is employing keywords and special verbs to tag the namedentities, we used gazetteers list for matching the named entities. The performance ofour Rule-based system achieved is F-value0.397.We implemented the second part of our hybrid system (Machine Learning) byusing Maximum Entropy model. The tags we got from the Rule-based system usedas feature to feed the machine learning model. Beside the rule-based featuresgeneral feature has been used for machine learning as POS: we used Stanford Part of Speech tagger to tag the words. The result of our Machine learning model isF-value0.495.The proposed hybrid system is laying on combining the two approach(Rule-based and Machine learning) by feeding the output of the Rule-based systemas features to the machine learning. Then complementing these features with othergeneral features we used like POS features and feed it to the classifier. Theperformance of our hybrid system is better than using Rule-based or Machinelearning individually, the result achieved by our Hybrid system is F-value0.528.We compared our hybrid approach with “ARNE” an Arabic named entityrecognition system has been published by “Carolin Shihadeh” and “G¨unterNeumann” on2012. The comparison shows that our system over performs theirsystem by F-value0.203.
Keywords/Search Tags:Arabic Named Entity Recognition, Natural Language Processing, Rule-based Approach, Machine Learning Approach, Hybrid Approach
PDF Full Text Request
Related items