Research Of The Automatic Metadata Extraction Based On The Conditional Random Fields

Posted on:2010-10-08

Degree:Master

Type:Thesis

Country:China

Candidate:N Hou

Full Text:PDF

GTID:2178360302959005

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of the digital library, the electronic documents become the main source for people who want to obtain the information. In order to help people finding the research papers efficiently and effectively, the technology about the metadata extraction attracts many researchers'attention. The automatic metadata extraction accounts for the trouble of the metadata which mainly request people read documents to locate the metadata and input them into the database by handwork in tradition. It helps organizing the information orderly, controlling them appropriately and finding them easily. As the theory of the machine learning becomes well-rounded gradually, the automatic metadata extraction becomes the research hotspot at present. This paper mainly focuses on the automatic metadata extraction which based on the conditional random fields.Firstly, it proposes a text segmentation technology to segment the text, regarding to the existing problems of the traditional metadata extracting technologies which based on the words composing the research paper header are the extracting task was large and the accuracy was low. The process of the segmentation is introduced in detail. So the extracting fields are corresponded to the blocks. Because some states contain special words, so some blocks can be decided using the extracting rules. Then the state of remaining blocks can be calculated using the heuristic search algorithm.Secondly, in order to extract the citation metadata accurately, considering the formats about the citation information are different and extracting fields are next to each other, the reranking based approach is proposed to extract the citation metadata. This method must use the result which were gotten by the conditional random fields, then rerank the candidates labels to achieve the citation metadata extraction.Finally, it also gives out the analysis and verification to all the technologies which are mentioned in this paper. Subsequently, it is compared with the existed typical algorithm and also makes the prospects for the future research.

Keywords/Search Tags:

Metadata extraction, Conditional random fields, Text block, Heuristic research, Reranking

PDF Full Text Request

Related items

1	Metadata Extraction Based On Third-order Conditional Random Fields
2	Hierarchical Information Extraction From Research Papers Based On Conditional Random Fields
3	Research On Web Text Segmentation Based On Conditional Random Fields
4	The Research On Short Text Mining With Conditional Random Fields And Improved LSTM
5	Text Categorization Based On The Conditional Random Fields
6	Research Of Web Text Named Entity Recognition Based On Conditional Random Fields
7	Research On Personnel Resume Intelligent Extraction System Based On Conditional Random Fields
8	Information Recognition And Extraction From Chinese Periodical Papers Based On Conditional Random Fields
9	SAR Image Change Detection Based On Conditional Random Fields
10	Research On Online Detection Method Of Reputation Fraud Campaign Based On Conditional Random Fields