Font Size: a A A

Research On Relation Extraction Of Person Entity In News Webpage

Posted on:2012-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ZouFull Text:PDF
GTID:2218330362960368Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
After years of rapid development,Internet has accumulated vast amounts of information resources,which include valuable relationships between persons.These person relationships play very important roles in intelligence analysis,network public opinion monitoring and social network analysis.Mary researchers have been aware of this,and researched on the subject.Since the text of the news web pages is more canonical,timely and reliable than other webpages.News web pages have become the major material in Internet-based person entity relationship extraction researches.Based on the above,according to actual requirements,this paper carry out a number of studies.These studies include:1.At the beginning,we analysis of the general web crawler's advantages and shortages.Then based on the specific application background and actual needs,aimed at improving the accurate and efficient of news web page collection,this paper design and implement a theme web crawler which based on URL models.2.At present,webpage filtering algorithm is not efficient enough.After find out the cause and summarize the features of news web page,this paper proffers a new webpage filtering method which based on the count of characteristics of the news webpage.Experiment confirms the validity of the algorithm.3.When deal with multi-class classification problems,the support vector machine (SVM) can't give a good result.So we introduce of kNN algorithm to eliminate the vector which can't be classified correctly.However,when the number of vector is large,the classification performance of kNN algorithm is poor.This will seriously affect the efficiency of the person relationship extraction.One kind of improved kNN algorithm is put forward in this paper,which greatly improves its performance.4.At the end,this paper design and implement a news web page person entity relationship extraction prototype.This prototype has many functions,such as theme web page collection,Chinese word segmentation,part-of-speech tagging,person information extraction,relationship extraction and relation storage.The prototype is an overall implementation of the person entity relationship extraction in news webpages.And it is the best way to test the results of our study.
Keywords/Search Tags:News Web Page, Theme Web Page Crawler, News Web Page Filtering, Support Vector Machine, kNN, Person Entity Relationship Extraction, Relationship Extraction Prototype
PDF Full Text Request
Related items