Research On Rule-based Extraction Of Mongolian Character Attributes

Posted on:2019-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:M J Hu

Full Text:PDF

GTID:2428330563457215

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In the Internet era of information explosion,the Internet is full of massive information and data,and most of them are in the form of semi-structured text or free text.Users have higher and higher requirements on the efficiency of information retrieval and the accuracy of the obtained results.Therefore,Information Extraction(IE)technology has been rapidly developed to help users obtain the target content quickly.Although Information Extraction technology has got many practical achievements in the study of Chinese and English,the research on Information Extraction in Mongolian,a minority language,is still in the primary stage.The information extraction involved in this thesis is specifically referred to as entity relation extraction.It aims to extract target information such as character attribute values from large-scale text data,and then save the extraction results in a structured form and provide users with subsequent queries,and this research is also a basic work for building network applications such as knowledge bases or Mongolian character search engines.This thesis uses Mongolian unstructured texts which are crawled from Mongolian news websites to research the extraction of character attributes for highly concerned character by a rule-based approach.The key research work of this paper is as follows:(1)We designed crawler tools and crawled texts from several Mongolian news websites based on their structure and URL characteristics of the located web page.Then we perform multiple pre-processing tasks including Named Entity Recognition which adopting a combination of BLSTM and CRF models.After a series of pre-processing,we got the web text corpora for subsequent analysis.(2)Using manually created trigger word table and rule base to extract the character attribute value contained in the corpus and save them in the form of �character-attribute-attribute value� triplet.The accuracy of the extraction results is practical.It is proved that the rule-based extraction method proposed in this thesis is feasible and effective.(3)Design and implement a Mongolian character attribute extraction system,which provides two functional modules: character attribute extraction and character related attribute information query.

Keywords/Search Tags:

Mongolian information extraction, character attributes, Web Crawler, Named Entity Recognition, trigger words, rule-based

PDF Full Text Request

Related items

1	Mongolian Named Entity Recoginition
2	Research On Microblog’s Event Extraction
3	Research On Named Entity Recognition Based On Word Information Relevance And Multiple Semantic Features
4	Engineering Construction Of Text Named Entity Recognition And Topic Extraction Based On Information Extraction Technology
5	The Research Of Announcement Information Extraction
6	Joint Extraction Of Named Entity Recognition And Entity Relationship Based On Neural Network
7	Research Of Entity Knowledge Base System Based On Information Extraction
8	The Research Of Chinese Named Entity Recognition And Information Extraction
9	Research On Chinese Time Expression Recognition Technology Based On Rule Extraction
10	Biomedical Named Entity Recognition And Entity Relation Extraction Based On Deep Learning Method