Font Size: a A A

Design And Realization Of Minority Web Information Retrieval System In Cloud Environment

Posted on:2016-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:X P JinFull Text:PDF
GTID:2208330503451515Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Yunnan is the province which own large number of ethnic minority, these data resources are mostly located on the local government website and forum in ethnic minority areas and the related websites.Because of the accuracy and integrity are not guaranteed of the all kinds of informations.Because Web data resource have some questions of data redundancy,data are not precise,useful for small data quantity,the data structure of diversity and the low popularity of the website cause the information island problem.In order to tap this Web resource more conveniently,this paper based on the research and analysis of the knowledge and technology of the Web data mining and cloud computing,and improved the Page Rank algorithm of Web structure mining.The final designed and realizted the minority Web Information retrieval prototype system in cloud computing environment,and verified this study through the experiment. The main research is as fallows:(1) The minority Web structure mining is defferent from the other contents mining,the minority Web data resource itself has the characteristics of multisource,heterogeneous,mass and redundancy,at the same time,because of the lack of department of specialized part of audition on those minority Web resource,the authority and integrity of defferent minority Webpage is defferent.In addition,the distribution of ethnic minorities is regional,these home-made regional government in the authority and credibility of the information screening has higher when, at the same time,as the national informationization,some research institutes and universities also do much research of the culture of minority nationalities,they also have higher authority of the screening of national minority information resources. Therefore,if a minority web page is pointed by more government or education web page, then the page is even more important.According to these characteristics,this paper propose a improved Page Rank algorithm which suits the minority Web structure mining and discussed this improved algorithm.(2) Due to the Page Rank algorithm calculating without considering the content of the page,so also cause the topic drift phenomenon.Aiming at this problem,this paper combin the ontology of national information resources to achieve accurate serach. And disscuss how to store this ontology by relation database,then parsed the title of the page by DOM parsing and text segmentation,and in accordance with the title keywords and the ontology concept set corresponding and stored in the database,so as to achievethe accuracy of the search.(3) In order to improve the efficiency of the minority Web Information retrieval,this paper studied this Information retrieval in the cloud computing environment,and implements the prototype system,then analyzes and discusses the function moudules of this prototype system,finally verified the feasibility of the system through the experiment.
Keywords/Search Tags:Cloud computing, Minority information resources, Web Information retrieval, Page Rank, Ontolog
PDF Full Text Request
Related items