Font Size: a A A

An Improved Text-oriented Algorithm For Sieving The Domain-specific Concepts

Posted on:2014-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:L Q HuangFull Text:PDF
GTID:2268330392472495Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Ontology learning is a hotspot of semantic technology, as well as its appllications,being attracted by many scholars at home and abroad. Getting concepts, an importantpart of ontology learning, the filtering quality of which, decides the effect of ontologyconstruction. The data source, the text being the carrier, has been becoming themainstream of current ontology learning. Therefore, this paper pays more attention tothe conception filtering in text fields.We should get candidate concepts first for the domain-specific concepts sieving,and then filter out non-domain concepts from the candidate concepts set to form a set ofdomain concepts. The existing domain-specific concepts sieving algorithms not onlyomit some important low frequency candidate concepts, synonymous with relationshipsor integral part of the relationship, but also select a large number of high-frequencyredundancy concepts which are not related to the field, affecting the precision and recallrates of the concepts sieving.In view of the existing concepts sieving algorithms have the inaccurateshortcomings, this paper presents an improved domain-specific concepts sievingalgorithm. This algorithm, using the contextual information of the candidate concepts,calculates the degree of similarity between the candidate concepts, and than identifiesthe low frequency with synonymous relationship and integral part of the relationshipwords set based on the value of the calculation results, as well as filters out partialredundancy concepts; Finally, this paper presents the improved formulas and fieldconcepts sieving algorithm, making it better filter these low-frequency but veryimportant field words.In order to prove the validity of the proposed method, the present paper conductscomparative experiments between the improved sieving method and the current popularusing algorithms, with the same data sets, as well as taking the accuracy (precision),recall rate (recall) and the measured value (F-measure) as comparative indicators.Seeing from the experimental results, for one thing, the improved algorithm is verysignificant for low frequency domain concepts, which include synonymous, integralpart of the relationship words, and synonymous as well as integral part of therelationship words. For another, the improved algorithm avoid some omissions whichattributes to the low frequency, greatly improving the precision and recall rates of the domain concept extraction.
Keywords/Search Tags:Ontology learning, Candidate concept, Context, Field concept, Sievingalgorithm
PDF Full Text Request
Related items