Font Size: a A A

Research On Chinese Expert Metadata Extraction

Posted on:2015-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:X PanFull Text:PDF
GTID:2208330431476801Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The expert metadata mainly is the basic information of expert, such as expert’s name, native place, address, job title and research interesting etc. The task of extracting expert metadata is the research on exactly recognizing the entity that representes the expert basic information from the set of expert page, and the task has become a hot research at present. This paper focuses on the issue of expert metadata extraction, in using single page feature to extract expert metadata from single page respect, and the paper Carries out research and investigating in using relationship between pages extract expert metadata from multi-page respect, and mainly achieves the following achievement:1. Single page expert metadata extraction based on the2D cascaded modelAimed at the problem of expert metadata entity structure of Chinese expert page is complexity and Chinese expert page have Chinese description characteristics. First, we select the metadata extraction feature to define Chinese expert metadata extraction template combining the characteristics of Chinese expert page. Using Conditional Random Fields to combine the template constructs the first dimension model to extract basic component unit of the expert metadata. Then, the extracted basic component unit of the first dimension model is seen as the feature, using the Cascaded Conditional Random Field constructs the second dimension model, and extract complex combination entity as metadata.2. Multi-page expert metadata extraction based on the related page group of expertAimed at the problem that the obtaining pages by retrieving expert name and the expert have different relevance, this chapter proposes a multi-page expert metadata extraction method based on the fuzzy clustering. First, the features of multi-page relationship are chosen, using the Maximum Entropy model constructs the page classification model to acquire the related page group of expert from the returned page set by retrieving the names of experts. Finally, using the method of fuzzy clustering and the related page group as guide information extracts expert metadata from multi-page.3. Multi-page expert metadata extraction based on the relationship between clusters and within clustersAimed at the problem that the relationship between clusters and within clusters are not fully utilized, this chapter proposes a multi-page Chinese expert address, job title and research interesting metadata extraction undirected graph model. The extracted metadata that comes from single page is seen as the node, the relationship feature between clusters and within clusters is seen as the edge, using the node and the edge build up an undirected graph model to extract expert metadata from multi-page.4. The design and implementation of Chinese experts metadata extraction prototype systemUsing the above research achievement develops multi-page expert metadata extraction prototype system. Achieve natural language processing and machine learning domain expert metadata extraction prototype system.
Keywords/Search Tags:Chinese expert metadata extraction, Metadata composition structure, therelated page group of expert, the relationship between clusters and within clusters
PDF Full Text Request
Related items