Font Size: a A A

Design And Implementation Of An Ontology-based Multimedia Material Web Crawler

Posted on:2016-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y F TengFull Text:PDF
GTID:2308330482454653Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The network resources in the network times accumulate every second in tens of thousands. Large search engines have to provide hundreds of millions of search service every day. Therefore, the accuracy of the resource acquisition has gradually become a hot topic in the research field of web crawling.There has been a considerable development for network resources acquisition methods. The structure and basic algorithms of web crawler have been relatively stable. They are classified as focused crawler, semantic crawler and learning reptiles and other branches. Focused crawler is mainly aimed at the correlation between the crawling web and the crawling target, including the related resources to be found and forecast of the crawling URL; and the sorting of the URL list to be crawling. Semantic crawler based on the traditional focused crawler to calculate the semantic relevance of the topic to get the priority of the web page. At present, it has been an attempt to crawl on the basis of ontology. Learning to crawl from the statistical point of view, such as the Markov model, based on the method of content, etc., through training to guide the processing priority of the web. This research branch is parallel to the direction of semantic crawling, and has achieved some significant results. In this paper, the author pays attention to the research hotspot of semantic crawling, and tries to design and implement the semantic crawler using the ontology technology. Specific work includes the following parts.First of all, according to the author’s own work needs, through the investigation and the relevant staff to carry out the work of the problems encountered in a summary of the way, to find and sort out the CAI courseware production of multimedia resources. Through the ontology modeling tool, a multimedia resource ontology knowledge base is set up.Secondly, according to the classification of CAI courseware in multimedia resource ontology, the semantic similarity based on Word Net corpus is designed. The semantic labels and the similarity of URL are defined, and the algorithm of calculating method is given.Thirdly, based on the.Net framework using C# language, the proposed algorithm and a simple web crawler program have been designed and implemented. The actual implementation details of URL storage, Http request, and response are given.Finally, the function of the algorithm is tested and improved.In view of the author’s professional background and academic level, the author’s ability is not able to complete the design and development of a full-featured multimedia web crawler’s work. But I according to the actual work, in the software engineering master of learning during the teacher’s guidance and help, complete small software for their own work. In this thesis, the theory and method of software engineering master’s requirements are understood and mastered.
Keywords/Search Tags:Web crawler, ontology, semantic similarity
PDF Full Text Request
Related items