Font Size: a A A

Research And Implement Of Network Information Extraction Oriented To The Mobile Platform

Posted on:2007-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2178360185985904Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, with the continual increase of the demands and markets on wireless application, both the business of wireless application and WAP devices have got more and more attention, and it has become increasingly popular in surfing the Internet with cellular phone. However, because the protocol and markup language used by the WAP devices is not the one used in Internet, resources that wap devices can access is very limited, which seriously restricts the development of Mobile Internet.In the thesis, it investigates the methods to extract content from web pages and the technologies to process content information based on the Mobile Platform. So the target of the thesis is how to make good use of the information about Internet on the Mobile application, which can be divided into three sub-parts, that is, 1. Auto-fetching web pages from the Web and auto-transforming from Web pages to Wap pages; 2. Extracting useful information from web pages; 3. Processing the information with Natural Language Processing (NLP) technologies.Auto-transforming from Web pages to Wap pages is an indispensable part of processing all kinds of information on the Mobile application. This paper analyses the resemblance and differences between HTML and WML, and implements a tool of Auto-transforming from Web pages to Wap pages. This pages-auto-transforming tool has a function of web auto-fetching, which can auto-fetch a site and process tag transformation and content reformatting. Web Page Content Extraction is a technology of extracting useful information from web pages. Wrapper is the common method for web page content extraction. This method usually processes a high precision, but it uses different wrappers to different web sites. So it has little generality and it is difficult for people to maintain it. We implemented an approach conjoining formula and statistic for extracting content text from Chinese news web page. This method not only overcomes the shortcomings of the wrapper method but also has a high precision. Besides, our method is easy to be implemented and maintained.Finally, in this thesis we design a system to process free text with NLP technologies, which includes Word Segmentation and Text Classification. In this...
Keywords/Search Tags:Mobile Internet, WAP, WML, Pages Transformation, Wrapper
PDF Full Text Request
Related items