Font Size: a A A

Algorithm Research Of Information Extraction And Its Application In Scientific Research And Service System

Posted on:2013-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:S HuangFull Text:PDF
GTID:2298330467478434Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, World Wide Web is already become the world’s largest information sources and knowledge base. It is become the focus of research scholar that how to extract the useful information fromthe website. Scientific research service system is a convenient service web site for university teachers engaged in teaching, scientific research and management of results, as well as with the outside of the communication service web site. The core of the system is the literature management and teaching information management. The main technology is to extract the literature information in electronic periodical database website.this paper study the structure of two web sites respectively and put forward their extractive method based on the study of domestic and foreign information extractive method.Of the problem that literature information is extracted, in view of the website page templates of electronic periodical database is mostly generated by the same, this paper make full use of this web page features, based on the literature information extraction method of the template. Firstly, overall structure of information extraction was designed, secondly, theme information’s templates generating method and theme information extraction method was studied and design. In the theme information’s templates generating extraction method, in view of the characteristics of literature pages, and put forward a kind of heuristic rules, and combined with DSE algorithm, and to generate accurate template generation algorithm which is used to extract information path.the simulation results show that this design method is feasible and adaptability.Of the problem that school timetable information is extracted, in view of school timetable website is mostly showed by the table, this paper make full use of this web page features, based on the table information extraction method of the heuristic rules. Firstly, overall structure of table information extraction was designed, secondly, fixed table’s position method and table information extraction method was studied and design. In the fixed table’s position method, in view of the characteristics of table websites pages and construct the Table-DOM, put forward a kind of heuristic rules, to fix the theme information position. simulation results show this design method is feasible and adaptability.Of the problem that the theme information should be classified, in view of the characteristics of the theme information, in this paper, based on support vector machine the text classification. Firstly, overall structure of the method of text classification was designed; secondly, the text pretreatment method, Feature selection and extraction method, Model training methods and text classification method were designed. Of the problem that the theme information should be classified, adopt One-against-One multi-classification algorithm. Finally, the collection of literature information and schedule information provide for the simulation experiment as sample data, the result shows that this design method is feasible and adaptability.At last, according to scientific research and service system goals and needs, the overall functional structure was designed. Literature information extraction method and school timetable information extraction method was applied in the scientific research and service system application, and the respective function structure was designed...
Keywords/Search Tags:information extraction, web page templates, support vector machine, heuristicrules, research service
PDF Full Text Request
Related items