Font Size: a A A

Research And Implementation Of Web Marine Data Crawling And Storing System

Posted on:2011-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:L YaoFull Text:PDF
GTID:2248330395457928Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With information investigation and data collection during the past years, research institutions from each country have saved a mass of marine scientific data and related information. With the development of Internet, these institutions have posted the marine data and information on Web where people could query or download these data. However, it is difficult for us to automatically and efficiently obtain a large-scale Ocean Data from Web, because there are not any tools to retrieval and snatch data in this field. Further more marine data on Web are usually in the form of scientific text data files, which are semi-structured and hard to understand without additional metadata files. If we want to query and analysis these data effectively, they should only be loaded into relational database.In this thesis, we construct a system framework used for searching, crawling and loading Web marine data. This framework contains three modules. Their tasks are searching the websites which provide ocean data, download these data that have been found and loading the data into database respectively.For searching the target website, we build a keywords library in the field of marine science data and give an algorithm to evaluate the topic relativity of web pages. In the method of filtering the results, which are returned by search engine, by this algorithm, we get the website address we need. For getting the marine scientific data, we design a special web crawler and give an algorithm to extract metadata files. In this way, we could not only download these ocean data files on Web, but also understand their meaning. For storing these text data, we design and implement a model which gives the mapping rules between scientific text data and relational database data. Based on this, we load the marine data in text files into database, where they could be used effectively.Through the practical applications, the system we designed in this thesis could achieve good results and meet our need to get vast amounts of marine data. At the same time, this system has good interactivity, flexibility and expansibility.
Keywords/Search Tags:links rating, web crawler, scientific text, data extracting
PDF Full Text Request
Related items