Font Size: a A A

Research On Text Information Searching Technology Based On Internet

Posted on:2016-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:WeiFull Text:PDF
GTID:2308330470983670Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet, the Internet has become a huge knowledge system, contains a lot of information. Vast amounts of information on the Internet makes it an important source of people’s daily life, Web information includes not only text information, but also contains a number of interference information, how to dig out the needed text information from the internet and transform the address information into a standard location information for location services become increasingly important.For text information mining from the Internet, an information extracting method based on baidu API is used to extract data. The extract information method focused on the data provided by Baidu API, for the lack of this method, this paper proposes a web information extraction method based on Jsoup, mainly O2O (Online To Offline, online and offline e-commerce) Web site information extraction. For the deficiencies of the prior methods, this method makes ues of the statistical method to find the text block and analysis the label of the text block, on-page judgment in a specific label and recursive, generate information extraction module, and then extract the information they need. After extracting the text information from the Internet, standardized text message address information is important module of text processing, for address information standardized, this paper presents a trust parsing algorithm, firstly, segment Internet address information into words, then match the word of address information with administrative divisions dictionary, infer the matching zoning information with trust parsing to getstandardization obtain location information.In this paper, information extraction method based on Jsoup experiment 020 mainstream site, and the experimental results and DOM tree algorithm, web page segmentation algorithm were compared. Experiments show that the proposed information extraction algorithm has better extraction.
Keywords/Search Tags:Baidu API, Jsoup, Chinese address, trust parsing
PDF Full Text Request
Related items