| The rapid development of computer and multimedia technology, the Internet has become the largest source of people access information data. At present, most applications of Internet information retrieval on Internet, is always considered as a whole can not be divided", information storage and retrieval are based on the web page for the minimal units of. But in Web pages and Web content has become more complex, diverse, Webpage often contains a lot of complex multiple information, but Webpage often because in order to beautiful have too much modification, or mixed with some be of no great importance information without any relationship between advertising and the needs of the people in the.The impact of return content accuracy information still exists in the information retrieval system of the final results obtained, when will the retrieved Webpage accurate feedback to the user retrieval system, the user in the face of a new Webpage large and complex, you will see things in a blur, in many cases, local documents is not containing retrieval function, the user wants to find the required content must be re turn to the final document, may still have gained nothing. Therefore, correctly extracting thematic information and documents Webpage has important significance, can help the user to quickly understand Webpage and document content.This paper mainly includes the following research contents:1, through the analysis of informationization development at home and abroad, points out the problems and difficulties existing in current information retrieval systems, in view of this propose research purpose, content and significance of the topic, and discusses the research status of information extraction technology at home and abroad.2, carries on the introduction to the theoretical basis in the process of system design and the key technology used, including their characteristics, the use of platform3, mainly analyzes and introduces the demand analysis of the system. The demand analysis of HTML5document outline parser mainly includes the system is feasible, including operation is feasible and whether the technology is feasible; then introduces and analyzes the main function module of the system demand. Then, from the non functional aspects of the system requirement analysis, introduces the design principle of the system and the matters needing attention.4, mainly introduces the design and code the system implementation work. In strict accordance with the software development process, this chapter from the system design goal to start, and then introduces the outline design of system, introduces the function design of system structure, and the parts of the design, introduces the implementation of the content and design ideas and the code needed to implement.5, summarizes the design and Realization of the final result of the system and achievements, and analyzes the problems in the system, and a further prospect and the future work. In a word, by the research of this task, and in the successful implementation of PDM system for enterprise purchasing department, to provide demonstration and experience for the successful application of informatization construction in the same industry, laid the foundation for innovation to promote the adjustment of industrial structure in the same industry and management mechanism. |