Font Size: a A A

Named Entity Recognition For Micro-blog

Posted on:2014-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:R H JiangFull Text:PDF
GTID:2308330479479462Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Named entity recognition is one of the important branches in nature language processing fields, which is also acts as a critical role in many NLP applications, such as information extraction, information filtering, information researching, answering system and machine translation, etc. In recent years, with the development of micro-blog, micro-blog text, as a new carrier for named entity recognition, attracts more attentions of some researchers. Since the differences between traditional texts and micro-blog texts in expression forms, the application of the traditional named entity recognition methods in micro-blog texts faces new challenges. In order to promote the development of named entity recognition technologies in new fields, the research of named entity recognition for micro-blog is of great theoretical and practical significances.This paper mainly researches the technology of named entity recognition based on micro-blog texts including person names, location names and organization names. Based on the characteristics of micro-blog texts, they contain much important information which can be used to help named entity recognition. This paper first excavates the characteristics of structure and contents of micro-blog texts through the comparison of micro-blog texts and traditional texts, and then through analysis, makes use of the characteristics including tags, comments and forwarding which can help named entity recognition, and overcomes the characteristics including clumsy language, abbreviation and antonomasia which are bad for named entity recognition. On this basis, this paper puts forward a named entity recognition method based on the combination of statistics and rules. This method takes advantage of the ICTCLAS for segmentation, filters the common word of the treated texts through constructing a common words table, then uses statistical method to analyze the contents of comments and forwarding, and finally finishes the named entity recognition with boundary rules of named entity recognition. Through the results of experiments, this method can reach to a high precision for named entity recognition for micro-blog texts. In the experiment on the corpus<30000 pieces of micro-blog text> of May, 2013, the F value of named entity recognition can reaches to 97.93%.Compared with the named entity recognition in traditional texts, there are wide gaps between traditional texts and micro-blog texts in the accuracy of the extracting process, the amassment of the knowledge resources, etc. Therefore, in each step of the work process, we carefully analyze the supports and restrictions for micro-blogs created by the existing resources and named entity recognition methods, and exploratory research a named entity recognition method with robustness, hoping to lay a foundation for further research and find the breaches, in order to provide experiences for next works.
Keywords/Search Tags:Chinese named entity recognition, micro-blog, short texts
PDF Full Text Request
Related items