Font Size: a A A

Research On Algorithm Of Semantic Net Mining Of Short Texts Based On Wordnet

Posted on:2013-08-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y D DiFull Text:PDF
GTID:1228330395959631Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the fast development of the information era and the popularization ofinfomation technologies, information increases in an explosive manner in life andproduction, and so does people’s dependence on information processing. Semanticstudies, especially those on semantic similarities, have become a frontier and hotsubject in research. In the fields of artificial intelligence, cognition, semantics,psychology and biological informatics, semantic similarity study is being attachedmore and more importance. Research on information processing technologies hasdeveloped from those on word and semantic processing to those on the intelligentsemantic processing level. Research achievements in ambiguity elimination, autoabstract, text categorization, concept abstraction and natural language processing havea promising future and are being applied even more widely.With Short Text Semantic Similarity Calculation as its basis, applied semantics isplaying more and more important roles in text relevance analyzing, Web page searchand categorization, text data mining, Q&A research, information search andabstraction. During recent years, semantic similarity calculation algorithms have beenfast developing and widely applied. At present, the algorithms mainly focus oncalculation of similarity on long texts or large documents. Few focus on short texts orsimilar subjects. And there is much room for improvement of the algorithms. It is veryimportant in applied computer science to conduct semantic similarity research,improve the algorithms, elevate calculation efficiency and accuracy, and furtherimprove the applied systems of semantic similarity analyzing.Based on in-depth research on concept and short text semantic similarity, thispaper puts forth the Algorithm of Short Texts for Semantic Net Mining based onWordNet and details its experimenting and verifying process.I. Algorithm of Semantic Net Mining of Concept Similarity1. IC Model based on the WordNet and the Brown Corpus IC-CWBased on WordNet and the Brown Corpus, IC Model IC-CW is introduced in the paper. IC-CW has taken into account both the probability data and the semantic data inWordNet and Brown Corpus. The semantic data in the semantic bank better reflects thesemantic information of the concepts, comparing with traditional IC models.2. Semantic Similarity Algorithm SS-CWSS-CW, developed on the basis of IC-CW, comparing with traditional models,does not require its users to have relevant knowledge background, as sharedinformation and probability information of the concepts in the data bank is taken intofull consideration. Experiments prove that results from this Algorithm are fairly inconsistency with results from human analyzing.3. Extending Relations Model Mining IC-ERWith Nuno calculation as its basis, it fully takes into accountHypernym/Hyponym and Meronym/Holonym relations and other factors that mayinfluence calculation. Comprehensively considering Hypernym/Hyponym andMeronym/Holonym relations, the researcher of this paper puts forth the ExtendingRelations Model Mining, which is better than Nuno.4、Word Semantic Similarity Calculation based on Path and Information ContendOn the basis of traditional semantic tree path similarity agrithm, Word SemanticSimilarity Calculation based on Path and Information Contend is proposed,comprehensively considering the influence of probability information upon relevance.Experiments have shown fairly good results.II. Algorithm of Semantic Net Mining of Short Text Similarity1、Algorithm of Semantic Net Mining of Short Text Similarity Based on ConceptProbability Information ST-CWOn the basis of IC-CW and SS-CW, we propose ST-CW. ST-CWcomprehensively considers the string similarity between concepts and the sentencesimilarity based on the lexical matrix, and considers the relevance between stringinformation and semantic information. Taking all the factors into consideration weaccomplish the semantic similarity algorithm between short texts.2、Short Text Semantic Similarity Algorithms based on Maximum ST-MAXMainly considering the Maximum of semantic similarity of the concepts, wepropose ST-MAX. It is based on path, information quantity, and different relationsbetween concepts. It is efficient, feasible and reliable.3、Resource Interlinking Mining SS-RDF The researcher of this paper conducts an integrated research on the RDF data set.In the light of the lack of a systematic and practical data set of integration tools for thecurrent RDF algorithm, the researcher puts forth a SS-RDF solution, under theguidance of the domain ontology, on the basis of in-depth study. SS-RDF is a resourceproperties and semantic relationship matching system. The system employs graphalgorithms automatically extracting the RDF dataset. This applied RDF integrationsystem is developed with a flexibly configured new resources matching algorithmpackage of fuzzy string comparison, semantic similarity and word relationshipcomputing technology.Database WordNet, lexical library Brown Corpus are important semanticinformation databases, semantically effective, verified through years practice.Semantic similarity algorithm of this paper is mainly based on the typicalrepresentation of the ontology knowledge base WordNet and lexical Library TheBrown Corpus. The use of these two databases ensures the typicality,representativeness and extensibility of the basic data used in the research and theresults achieved in the research. The feasibility and effectiveness of the proposedalgorithm is verified through experiments with Data sets RG, PS1, the PS2and Li et al.
Keywords/Search Tags:Semantic Web, Concept, Short Text, Semantic Similarity
PDF Full Text Request
Related items