Font Size: a A A

Research Into Query Interface Schema Extraction Of Deep Web

Posted on:2010-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y TanFull Text:PDF
GTID:2178360272997090Subject:Software engineering
Abstract/Summary:PDF Full Text Request
There are mainly two modules of Deep Web data integration framework:the creation of integrated interface and the disposal of interface from integrated searching interface.And every module is also could be divided into several sub-modules.The creation of integrated searching interface contains four modules:The detection of Web Data Base,The abstraction of searching interface schema,The classification of Web Data Base,The integration of searching interface.The main purpose of interface schema extreation is for classifying the Web database and integration of searching interface in the next step,and its responsibility is listing all the atrributes systematically according to a certain requirement.Then,put out the results in order to prepare for the next step.In the same time,we also get a searching ability of this searching interface.Therefor,it is very important to extract searching interface.The schema extraction of searching interface could go with a swing is on premise of getting the right Web page,that means the file contains the searching interface.So, it is important to determine the file whether it contains a searching interface or not.Classification is a important item of data mining,aiming to build a classification function or a classification module.There are three steps:first,phase of training module.Second,phase of evaluation module.Third,phase of classification.Building a decision tree follows steps below:1.chose the important attributes which could stand for the sample,and make sure the value of every attribute.2.chose the sample which has strong classified ability to be the decision node in the current set.3.divide the current set into some sum-set,according to the different value of decision node.4.repeat step 2 and 3,until the set satisfy one of the three followings: First,all the types are the same class.Second,all the attributes have been finished,no choice.Third,all the value of attributes are same to each other.Schema extraction is aiming at files containing searching interface.And most searching interface codes are locked in
target.Which means,it is probable that the file has no searching interface if it has no neither.Regarding to detailed searching interface,which containing more attribute values, allowing user to point more detailed searching factors,its interface schema may be similar to result schema.However,there are more simple searching interfaces still, even only applying a single text input box.In order to apply users with satisfying result,most web set allow Advanced Search.According to the condition of distributing,web could be plotted into "Surface Web" and "Deep Web".In this part,"Surface Web" means the traditional web which could be accessed by traditional search engine.However,there is no common on definition about Deep Web,at present,it refers to on-line web,which also means Web Data Base.And its content is kept in real Data Base.Compared with Surface Web,Deep Web contains more abundant,more professional information.The Web Data Base of Deep Web is not only in a large amount,but also contains every field of real world.Submitting is the primary method to get the exploitation of information in Deep Web.Chapter one and chapter two mainly introduce the background of this project simply.Chapter three represents the basic technology related to the project.Chapter four and chapter five mainly discuss the principle,how to put them into practice, improvement,result and evaluation.According to the result,we could see that the function has complete its mission, getting the expectation value.But there are still some weak points.For example,it is not allowed to receive a file which contains illegal character coding.And the targets should be one-to-one correspondence.All there should be improved in the future.
Keywords/Search Tags:Deep Web, Schema extraction, Decision tree
PDF Full Text Request
Related items