Font Size: a A A

Web Chinese Information Extraction, Named Entity Recognition And Application

Posted on:2010-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2208360272494596Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of Information industry, the Internet has become a important and indispensable tool in ours work and life. The web has become the primary source for people to get the information.,but the data on the Internet increases exponentially at the speed of one million pages every day. At present,the quantity of the pages has exceed ten hundred millions. In the face of the vast Information, how to get the Information that we needed fleetly and effectively has become the focal problem in Information Processing. The purpose of the research about Information Retrieval(IR) and Information Extraction(IE) is to solve this problem.The assignment of Information Extraction is to process the information in text whitch leads to that the un-structure and half-structure inforation become structure form . In this way, people can get the needed information through querying the information on the www just like a database. In the process of IE, the Named Entity Recognition(NER) is pivotal technique. The intention of the NER is to recognise the specifically entities. It has a major effect on the Nature Language Processing (NLP) about IE,Text Classification,IR,Question Answering System and so on, and it is the the basic of these technology.In this article, we set an example of trying to recognise the entities in those web pages about the biography of the celebrities to research the methods and application of the recognition about Person,Location,Organization mainly. In this text, it adopts the method that rules and statistics combinative maily. It points out the limitation of the traditional Hidden Markov Model (HMM) . The limitation is that it dissevers the relationship among words and neglects the influence of the context on current word. Through the improved HMM ,the Precision and Recall of the location entities have been raised. At the same time, on the base of the organization entities recognition using HMM .through constructing the One-element Model to recognise those abbreviative and unmarked organizations. The experiments demonstrate that it has obtained better performance...
Keywords/Search Tags:Information Extraction, Named Entity Recognition, The Hidden Markov Model, limitation, One-element Model
PDF Full Text Request
Related items