Font Size: a A A

Research And Implementation Of Personal Attribute Extraction In Chniese

Posted on:2017-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:K L ZhangFull Text:PDF
GTID:2348330518494783Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Personal Attribute Extraction(PAE)is a significant technology,which can extract person related properties in the unstructured text.It is also an important foundation for intelligent search for people.This thesis employs rule-based method and statistics-based methods for PAE in Chinese,integrating some Natural Language Processing(NLP)technologies,such as Named Entity Recognition(NER),co-reference resolution,part of speech(POS)tagging and syntax parsing.Firstly,this thesis proposes and implements a rule-based method to extract personal attributes.This method fuses trigger keywords,dictionary and rules and imitates the process of personal information retrieval by human.Secondly,this thesis transforms the PAE problem into an equivalent sequence tagging problem,and proposes a CRF-based method to solve it.Thirdly,this thesis also proposes a modified SVM method to classify candidate words and thus solve the PAE problem.Besides the traditional features like POS or entity category,this method also utilizes the features of syntax tree and employs the tree kernel.Finally,this thesis implements many experiments to compare the performance of these three methods and analyzes their disadvantages and advantages when they are working on extracting different personal attributes.Based on the experimental results,we implement an intelligent personal attributes extraction system,which can extract 16 kinds of person attributes and pick the most suitable extraction method when extracting some specific personal attributes.This thesis also designs a relational database to store the extracted personal attributes.This knowledge base can facilitate building relationship graph and people search engine.This thesis also implements an application based on the personal attributes extraction methods we proposed.It is a chatting agent system,which can customize the discussion based on personal attributes.
Keywords/Search Tags:Personal attribute extraction, Named entity recognition, Conditional random fields, Support vector machine
PDF Full Text Request
Related items