Font Size: a A A

Research On Compound Word Extraction Based On Location Tag

Posted on:2016-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:W G ZhouFull Text:PDF
GTID:2428330473465671Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The domain ontology is a hot topic in artificial intelligence filed,and the domain concept is the basic part of domain ontology,So the identification and extraction in domain compound concept is a basic research work.With the social progress,the development of science and technology,new concepts emerge in all endless stream,especially for the compound words in each field.These domain compound words generally are noun phrases which are formed by domain atomic words,they refer to more precise information for domain words.Identification and extraction in domain compound word is the basis of the domain text information processing,and it has important significance in domain ontology construction and application,text information retrieval and machine translation.The existing word segmentation system cannot recognize the new domain compound words,So it cannot meet the needs of practical applications.Therefore,automatic extraction of domain compound words is extremely needed.For the deficiencies of methods based on statistics and language rules alone,this paper proposed a method of compound words extraction based on location tag and POS(part of speech).Firstly,after cleaning text and automatic word segmentation,this method established location tag set for each item,then remove stop words and merge synonyms.Then it counted adjacent degree and co-occurrence degree to judge compound words on the basis of location tag set.Finally,formulated reverse rules and filtered garbage strings with them,detected combined words further from garbage strings by removing item from the head and the tail.This paper constructed a verification system for domain compound word extraction which is based on above thought,and made experiment on it.At the same time,we also used other two methods to do the experiment on different corpora.Then calculated accuracy rate,recall rate and F value.The experimental result shows that,the accuracy rate,recall rate and F value of this method is higher than other two methods.So,this method is more effective in domain conpound word extraction.
Keywords/Search Tags:compound words extraction, new words identification, location tag set, adjacent degree, reverse rule filtering
PDF Full Text Request
Related items