Font Size: a A A

Research On The Realization Of The Employment Information Extraction System Based On Web

Posted on:2011-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:S Q FangFull Text:PDF
GTID:2178330332966783Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of Internet, it has been becoming an important knowledgebase for people to searech for information and data.In the face of " the data ocean ",which is composed of the worldwide network as an effective means to gain potential and meaningful knowledge,the technique mined on line has been drawn more and more attention.It is necessary for vocational colleges to obtain a large amount of information about demanding talent,which has been provided guiding significance to specialty construction and course settings.The information on the internet has been an important part of the data sources. It is valuable that the information is found on the Web rapidly, accurately and efficiently on specialty construction and core courses settings in vocational colleges.Due to the characteristics on the Internet page such as a large amount of data semi-structural and dynamic changes,it also brings such problems as high complexity, low expansibility and adaptability to Web information extraction. The discovery of XML technology is provided a good opportunity to solve the data extraction on the Web. This dissertation is based on XML Web information extraction, belonging to content mined category on the Web.mainly studied as the following1. Based on the main Web information extraction difficulty to determine extraction rules effectively,this essay presents an information extraction method It is also discussed and researched on the learning of path and the relevant technical issues.2. On the basis of pages characteristics studied on the Web, it brings the characteristics of XML into web information extraction Jtidy is used to optimize and clean the Web page code, which is converted into XML documents. The DOM tree of web information is found in the analysis of XML in order that it is able to extract information better.3.With inductive learning rules based on the DOM data extraction strategy and data extraction, the strategy of the rules and data extraction algorithm has been suggested. By the machine learning rules (sets) generated extraction, template pages for similar structure information extraction have been generated with the rules.4.According to the data acquisition module, block data module, data extraction module (including rules and management, and employment information extraction) are given by the general framework. The development and experiment in an employment information extraction system of Job-hunting on the Web is accomplished with algorithm. The data in the database is saved so that these data be able to conduct with database technology for full and effective use.
Keywords/Search Tags:Web mining, Inductive Study, Rules Forming, information Extraction
PDF Full Text Request
Related items