Font Size: a A A

Study On Acquisition And Representation Method Of Knowledge Oriented To Web Documents

Posted on:2004-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:D G GuanFull Text:PDF
GTID:2168360122475518Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
What this dissertation researched is a part of courseware making and switch tools project, supported by the Science-Technology Project of the National 'Tenth Five-Year-Plan' of China. The primary task of the dissertation is researching the arithmetic, which could get and save information from Web documents; analysing the specialty of HTML to confirm its expression form in Web document; defining a model of Web document compactly by XML, referred to Learning Object Model (LOM). And a XML document we need would be built by the information which was from the former Web document. At first, the article described the weakness of HTML in Web document, then described the importance of trasform from Web document to XML document, at last, the article summarized the signification of this topic in dissertation. Its main body is how to transfer Web document into XML format, and no distortion. This study analysed the weakness of the method in getting and saving data from Web document we used before. And it described the new method which is to get description and structure information by parsing the tags of HTML in Web documents. We assorted all information with its format. There are text info, image info, animation info and stream format info. We dealt different info with different method. So it is more effective. Every tag was managed to parse maturely. How to express document information we got by XML. It is the question of establishing the document format (DTD) by XML. And it is pivot of this study. In the beginning, we defined the logic model, explained entity, property, relation, windows, events and echo which are in Web document considered as a set of entities. Then, the study described Web document description model, defined the relationship between metadata, struture, media resource, page resource, windows resource and forms. Now we have defined Web document logically and physically. The position of different elements of Web document were got and saved. This is the way to be sure that the new document was the same as the former document and no distortion. At last we developed the software based the model. The model advanced by this dissertation was used in the project supported by the Science-Technology Project of the National 'Tenth Five-Year-Plan' of China.
Keywords/Search Tags:Web, XML, Information Acquisition, Information Representation
PDF Full Text Request
Related items