Font Size: a A A

Information Extraction Technology Based On Ontology Web Non-normative Knowledge Processing

Posted on:2006-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q LiuFull Text:PDF
GTID:2208360155465249Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the fast development of Internet, the information of Internet looks through tempo of one time to increase rapidly with every four to six months, a traditional one rely on browser and key word search engine inquired to search already difficult to satisfied with people's requirements for Internet's information service, how to help people to find out the information material that oneself need accurately effectively, already in front of us urgent more and more. Half structurization of text have characteristic of structurization to Internet, we lead technology of Ontology into information extraction, and proposed a method that can deal with semantic。 This techonology provide user with concise, accurate information through collecting the same topic information dispersing on different website and store in structurization form. IE is different from complicated nature language understanding technology which faces the concrete task. IE technology usually adopts the simple text of layer to analyze technology, extracting the information the designer concerned. This technology is suitable for specific theme and to relatively confirmed information file of structure like the advertisement news, travel, the stock, meeting schedule, etc.Automatic information extraction has developed in the past ten years. Two factors have important influence on its development: One online text increasing by geometry grade, another " news understand seminar * in recent ten years to concern and push the field. The principle adopted according to various kinds of tools divides the existing tool into the following 5 kinds: IE based on natural language processing, IE based on wrapper induction, IE based on HTML structure, IE based on Web inquire and IE based On Ontology. IE based on natural language processing way has collected and drawn lessons from the treatment technology of the natural language to a certain extent, utilize clause structure, phrase and relation of clause set up based on grammar and semantic extraction rule realize message collect, the realization of this kind of extraction method is very complicated, it is lower to extraction efficiency; IE based on wrapper inducting way use machine algorithm to learn sample instance inadvance, producing it on the basis of the extraction rule, this kind of information collects need a large number of sample texts-, IE based on HTML structure orient information according to the structure of Web page, analyze Web file into a grammar tree before information is extracted, produce the rule of extraction through semi-automatic way. Information extraction is turned into the operation of the grammar tree, this method is strict with the structure of the text, the same extraction system can only apply to be the same or similar to the text of the structuring; IE based on web use standard Web inquire language to query web document, it is a general method. But this kind of method needs to turn Web information into according with the situation of XML syntax first, then write sentence of inquiring about according to structure its, other the method requisition for form of the text stricter even; IE based on Ontology realize the extraction through the description information of the text. This method set up domain ontology at first, then set up the extracting rules through the keyword and concept attribute of the domain ontology, then set up the database through the concept attribute and relation of the domain ontology, and realize the extraction through using the extraction rules and save the information which is extracted in the database. This kind of method needn' t consider the rule of grammar of the text, has not limited the structure of the text, can also guarantee the consistency of information in addition.
Keywords/Search Tags:Information extraction, Ontology, rule
PDF Full Text Request
Related items