Font Size: a A A

Entity Recognition Research And Application On Hotspot Information Of Internet Web

Posted on:2013-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:S M DaiFull Text:PDF
GTID:2248330395975088Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The task of Named entity recognition is recognizing the entity that has a specific meaningin the text, including people names, place names, organization names, proper nouns and so on.In today’s world, with the proliferation of computers and the rapid development of the Internet,a large amount of information presented in the form of electronic document in front of people.In order to deal with the serious challenges posed by the explosion of information, peopleurgently need some automated tools to help them quickly find the really importantinformation in massive information sources, so the information extraction technology cameinto being. And named entity recognition is an important part of information extractiontechnology. Meanwhile, it can also be applied to the field of natural language processing suchas Question answering, Machine translation, Information Retrieval and so on, contribute tothe improvement of their performance.However, due to Chinese restrictions of its own characteristics, the Chinese named entityrecognition has been quite difficult. In order to promote the development of othertechnologies and applications, study the Chinese named entity recognition technology is ofgreat significance, and is also very important.In this paper, we do research on the Chinese name named entity recognition, includingpeople names, place names, organization names and electronic products. Also, experimentsare done to verify the algorithm, and submit their application. The main works in this paperare as follows:(1) A Chinese people names entity double recognition method based on rules andregulations, probability and Statistics is proposed in this paper. Firstly, this method completeof the initial recognition of Chinese people names entity by the entity Knowledge Base ofpeople names, lexical rules of people names entity, the boundary conditions of people namesentity. Secondly, this method complete the final recognition of Chinese people names entityby boundary characteristics of people names entity and the credibility of the statisticalidentification model of people names.(2) A place names entity and organization names entity recognition method based on rules and regulations, web retrieval. This method find the trigger position of places entity andorganizations entity by the entity Knowledge Base of place names entity, the entityKnowledge Base of organization names entity, lexical rules of place names, lexical rules oforganization names.And then use the method based on web retrieval to complete the entityrecognition of place names and organization names. Among the method, using a place namesentity recognition method based on Baike retrieval strategy, using a organization names entityrecognition method based on Baidu retrieval strategy, finally proposed a abbreviation oforganization recognition method based on rules.(3) Complete the recognition of electronic products entity,including product names、product attribute、values of product attribute and comments of product attribute, about thenamed entity of the product names proposed the named entity automatic recognition model ofthe product based on the areas of seed words self-learning, about the attribute of productproposed the product attributes automatic recognition method based on the associatedprobability and statistics, about the attribute values of product proposed automatic recognitionmethod based on related rules-based of product attributes and units of product attributesvalues; about the attribute comments of product proposed Chinese grammar pattern matchingmethod based on the product of seed attributes.
Keywords/Search Tags:Named Entity Recognition, Rules and Probability and Statistics, Web Retrieval, Named Entity Recognition of Product
PDF Full Text Request
Related items