Font Size: a A A

Research On Information Extraction Method For Retrieval Result Pages Of Oa Journals

Posted on:2011-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2198330338491094Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays, the state of"information isolated island"for OA (Open Access) journal websites has strongly restricted the function which they should have. An approach to solve this problem is to integrate the retrieval service of OA journals, and construct a virtual space of data resource. In this way the rapid sharing of resources can be achieved. Extracting the retrieval results is one of the most crucial steps for the integration.The method of information extraction from the retrieval result pages of OA journals is studied in this paper. The material contents are as follows.Firstly, the methods of location for data region are not accuracy and not applicable to locate the retrieval result pages of OA journals, aimed at these problems, Data region location algorithm based-on statistics is proposed by analyzing the difference between data region and non-data region in retrieval result pages of OA journals. The description and implementation of this algorithm are also proposed. The algorithm is based on the web partition, and then statistics method is used to locate the data region.Secondly, in order to extract the information of papers in data region, data records partition is required. Data records partition algorithm based-on clustering is proposed. This algorithm cluster sub trees by calculating similarity in four areas including display style, data type, the structure of tag path and the adjacency characteristic. Aimed at the semantic identification problem of data unit for data records, data unit semantic identification algorithm is proposed based-on characteristic similarity after partitioning data records, this algorithm identify the semantic of data unit by calculating the similarity between data unit and semantic character string which was defined.Finally, we give out the analysis and verification to the precise and recall of all the algorithms which are mentioned in this paper. Subsequently, we apply them to the actual project.
Keywords/Search Tags:OA journal, Data region, Web partition, Data records, Semantic identification
PDF Full Text Request
Related items