Font Size: a A A

Research Of Information Retrieval Technology Based On Semantic Analysis

Posted on:2013-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:F Y ZhuFull Text:PDF
GTID:2248330377458957Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The current information retrieval technologies are mainly based on keywords matching.Most optimized researches nowadays emphasizes particularly on arithmetic rather thansemantics. But many problems can not be resolved fundamentally. For example, multiplicityof semantic, diversity of retrieval expression, omission of related website, how to removeirrelevant web pages, the order of web pages is not reasonable and so on. In view of theabove problems,a model of information retrieval based on the semantic analysis wasproposed. The model mainly includes four key points: the method of eliminating ambiguity,the method of semantic expansion, the method of keywords matching and the page-rankingalgorithm. More problems such as multiplicity of semantic, the relative pages are notretrieved and so on, could be effectively resolved through the model. In addition to these,more pages which accord with the retrieval purpose and do not have the keywords alsocould be got, and the page-ranking in order to make relative pages with top positions wasimproved.A method of eliminating the irrelevant semantics of the keywords in the array ofkeywords based on semantic analysis was used. The method could get the similar conceptbetween the concepts of the polysemous word and the keyword in the array of keywords onthe theory of ontology and rule out the irrelevant semantics of the polysemous word basedon the concept similarity. In the field of semantic expansion, a method of semanticexpansion based on the tree of the ontology was used. Many new keywords could beincreased under the premise of the retrieval purpose was not changed, the problems that therelative pages are not retrieved and the basis of page-ranking were resolved through themethod. A method for keyword matching based on the expanded array of the keywords wasproposed. It made the difference between the old keywords and the expanded keywords, andensured that they could play a significant role to retrieve pages and page-ranking effectively.At last, the algorithm of the ranking by the word frequency and the location based onsemantic analysis was improved. In order to the final weight of the page could be moreobjective for the users’ retrieval purpose, the weight of the keywords was initialized throughthe algorithm. The experimental data obtained through the development tools include Protégé3.4.7、Nutch1.2and so on showed that the relative accuracy ratio on the base of the traditionalrecall ratio and accuracy ratio under conditions of the practical environment of thedevelopment and test were fully considered. The effectiveness of the model in reducing thenumbers of the pages are not retrieved and ordering the pages based on the importance ofthe page by analyzing the diffidence of retrieval results compared with other models wasproved. Finally, the idea of this dissertation was proved to be having feasibility through theexperimental.
Keywords/Search Tags:Semantic analysis, Ontology, Information retrieval, Semantic similarity
PDF Full Text Request
Related items