Font Size: a A A

Research And Realization Of News Search Engine Based On Domain Ontology

Posted on:2013-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:X Q CengFull Text:PDF
GTID:2268330374463951Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the large-scale cover of network, browsing network news has become an important means for people to know about the social dynamics, news search engine is absolutely necessary. However, although the network is like an all-embracing encyclopedia, since the present search engine technologies are still in the stage of innovation, the information retrieval results are unsatisfactory very much.Therefore, in this paper, ontology theory, information retrieval and other technologies are combined to make SE develop in the direction of the intelligent retrieval and then improve the efficiency of the present SE. The main research work of this paper is as follows:First, a news web page classification algorithm based on domain ontology is put forward. In this paper, in order to improve the universal deficiency that the current classification algorithms just consider content similarity, a semantic classifying thought which combines the content similarity and structure relevancy is proposed: above all, ontology category vector is got by parsing operation, and then extract news web pages’keywords set and reduct this set semantically, at this time, and to find the common vocabularies between text vector and ontology vector to form the text expect vector, then computing the content similarity of text expect vector and ontology vector applying Cosine law. Secondly, mapping the vocabularies of web page text that exist in the ontology concepts set to ontology levels chart, and the structure relevancy can be obtained by computing the weighted path of the acyclic graph of the above common vocabularies. Finally, the comprehensive correlation degree can be computed by integrating both, and the category of the news web page is decided according to the relation between result and threshold.Second, an improved means based on weighted modified information gain called ωID3is proposed in this paper. Specific to the shortcoming of tending to attributes having many values In IDS algorithm, an improved idea is given in coIDS: find out such attributes that the information gain and values all reach the threshold, meanwhile, consider the correlation between condition attributes and decision attributes, its information gain will be fixed by adding weight, and according to the revised result to choose which attribute to be the splitting node. An example shows that the decision tree constructed by ωID3has a certain degree of improvement.Third, a news search engine prototype model MONSE based on domain ontology is designed, and under the support of open source tools such as Heritirx, Lucene. Eclipse, Tomcat and so on, an example is employed to verify it.
Keywords/Search Tags:domain ontology, search engine, similarity, relevancy, classificationalgorithm
PDF Full Text Request
Related items