Font Size: a A A

The Research On Integration Technology Of Network Text Of Heterogeneous Data

Posted on:2017-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:T X QiuFull Text:PDF
GTID:2308330482990752Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Although web crawler technology and study is quite mature in nowadays, the accuracy of search data is still an urgent problem for the search engine, At the same time, and the Internet web data is particularity and unstructured, so the retrival data can’t be stored by a certain data structure and caused the problem for heterogeneous data to be integrated.In this paper, the research modified the topic web crawler which makes it could be crawled data by accordance with the theme of semantics and improved the accuracy of search results and makes the search data to be integrated.Focus topic Web crawler is based on the theme crawler and used of regular expressions to describe a topic model, combined with the improved TF-IDF algorithm to make the TF-IDF algorithm could be judged on the topic’s semantics.Then improved the search accuracy of the data of topic web crawler. Based on semantic analysis of vertical web crawler is accorded of the semantic of user’s input to search network resource information and intelligently filters out the irrelevant information, which makes the retrieved information to be more accurate and comprehensive. Experimental data proofed that the improved TF-IDF algorithm could make the accuracy of search results increased by 10%.In this paper, the study used the Extensible Markup Language XML (Extensible Markup Language) technology of data exchange technology to integrate search results. XML and the data exchange middleware technology has becomed be a standard of the technology of exchanging data between applications. The flexibility, adaptability and structural diversity XML data sources makes the exchange of XML technology having obvious advantages in the study of heterogeneous data...
Keywords/Search Tags:web crawler, data heterogeneous, semantic analysis, XML technology, data Integration
PDF Full Text Request
Related items