Font Size: a A A

Key Technologies Research On Web Products Automatic Extraction Based On Web List

Posted on:2014-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y PengFull Text:PDF
GTID:2268330395989220Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet, computer has become part of our dairy lives, people use computer playing video games, shopping and working. Shopping online as a recently arise technology has received a lot attention of people. Shopping online make it easy for people to find cheap and useful things online, and get them without going out. As the result of this, the num of product information online has grown explosively.As most of the data online is semi-structure, and now the technology of the Internet can not afford the effective ways for people of find useful data. So we need to extract the semi-structure data from the Internet and story them in form of XML file,and turn all the semi-structure data into structure data.As most of the product information on Web are displayed in form of list, The site of the web list in html page is always the place of Product information. Since list as an information carrier having the characteristics of the structural consistency In this paper, we proposal a way to combine the automatic extract rules and very little user’s interface to extract the data. First we make use of the data structure of the page can always parse to the tree model, the consistency of the unrelated data has the same structure, so we can tick out of those information easily, and covert the rest region of the page to PAT tree format. Using PAT tree data structure, we can easily find the repetitive pattern of the page, and extract the data region. After that, we provide a GUI tool to users for them to mark the data they interested and store them in xml file. After we have extracted all the data, we label some records, and use them to train the classifier, and find all the association rules of the records, and make the automatic classifier based on the association rules. Using the decision tree classifier to classify the product information automatic to provide users with more clearly way to find products.As China entered the aging, more and more commodities are closely related with the life of the older, how directly and effectively from the mass of commodities selected goods of the needs of the elderly and has a lot of reality show in a friendly manner to the elderly significance.
Keywords/Search Tags:Web Information Extract, PAT tree, product information, XML structure, Decision tree, Association tree
PDF Full Text Request
Related items