Algorithm Research Of Information Extraction And Its Application In Scientific Research And Service System

Posted on:2013-02-28

Degree:Master

Type:Thesis

Country:China

Candidate:S Huang

Full Text:PDF

GTID:2298330467478434

Subject:Systems Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet, World Wide Web is already become the world’s largest information sources and knowledge base. It is become the focus of research scholar that how to extract the useful information fromthe website. Scientific research service system is a convenient service web site for university teachers engaged in teaching, scientific research and management of results, as well as with the outside of the communication service web site. The core of the system is the literature management and teaching information management. The main technology is to extract the literature information in electronic periodical database website.this paper study the structure of two web sites respectively and put forward their extractive method based on the study of domestic and foreign information extractive method.Of the problem that literature information is extracted, in view of the website page templates of electronic periodical database is mostly generated by the same, this paper make full use of this web page features, based on the literature information extraction method of the template. Firstly, overall structure of information extraction was designed, secondly, theme information’s templates generating method and theme information extraction method was studied and design. In the theme information’s templates generating extraction method, in view of the characteristics of literature pages, and put forward a kind of heuristic rules, and combined with DSE algorithm, and to generate accurate template generation algorithm which is used to extract information path.the simulation results show that this design method is feasible and adaptability.Of the problem that school timetable information is extracted, in view of school timetable website is mostly showed by the table, this paper make full use of this web page features, based on the table information extraction method of the heuristic rules. Firstly, overall structure of table information extraction was designed, secondly, fixed table’s position method and table information extraction method was studied and design. In the fixed table’s position method, in view of the characteristics of table websites pages and construct the Table-DOM, put forward a kind of heuristic rules, to fix the theme information position. simulation results show this design method is feasible and adaptability.Of the problem that the theme information should be classified, in view of the characteristics of the theme information, in this paper, based on support vector machine the text classification. Firstly, overall structure of the method of text classification was designed; secondly, the text pretreatment method, Feature selection and extraction method, Model training methods and text classification method were designed. Of the problem that the theme information should be classified, adopt One-against-One multi-classification algorithm. Finally, the collection of literature information and schedule information provide for the simulation experiment as sample data, the result shows that this design method is feasible and adaptability.At last, according to scientific research and service system goals and needs, the overall functional structure was designed. Literature information extraction method and school timetable information extraction method was applied in the scientific research and service system application, and the respective function structure was designed...

Keywords/Search Tags:

information extraction, web page templates, support vector machine, heuristicrules, research service

PDF Full Text Request

Related items

1	Research And Application Of Chinese Web Pages Automatic Classification
2	Research On Relation Extraction Of Person Entity In News Webpage
3	Researches On Some Problems In Nonparallel Hyperplanes Support Vector Machine And Feature Extraction
4	Research On Gesture Recognition Based On Zernike Moments And Support Vector Machine
5	The Research Of Automatic Chinese Web Page Categorization Based On Support Vector Machine
6	Malicious Web Site Recognition Based On Page Information
7	Research On Some Problesm Of Support Vector Machine Learing Algorithm
8	Design And Implementation Of Web Information Extraction Subsystem In The Public Opinion System
9	Research On Structure Support Vector Machine Classification Models
10	Reasrch On The Intelligent Acquisition Of Web-Based News Contents