Font Size: a A A

Scholar Resume Automatic Generation Based On Text Mining

Posted on:2012-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y QianFull Text:PDF
GTID:2218330368488107Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The user's resume information is the basis of structuring social network, structuring user's resume automatically will bring great convenience to construction and promotion of social website. This paper, which takes academic social website LinkScholar as application background and papers in PDF format and academic journals website as information source, obtains domestic scholar information by information extraction, applies technology of name disambiguation and generates scholar's resume.For the information extraction of Chinese paper in PDF format, this paper, by means of analyzing a large number of Chinese scientific paper, generalized Chinese paper metadata is no inclusion relative, exclusive, repetitive, sequential, and integrant deterministic, defined simple metadata and complex metadata, and proposed that dictionary matching and support vector machine model can be used for extracting Chinese scientific paper metadata. The experiment result shows that, the comprehensive performance index of this model is more than 96%, which is superior to Conditional Random Fields Model and Hidden Markov Model.For the information extraction of academic journals website, this paper proposes concepts of homologous HTML document and HTML skeleton sequence, and introduces the pairwise sequence alignment algorithm to Web information extraction on the basis of these two concepts. Applying pairwise sequence alignment algorithm to calculate the maximal same segment of candidate sequence and template sequence, this paper extracts various information items according to the template sequence tags. This method taking full advantage of the relation between structure and data in HTML document needs not construct a large number of sample databases and it can be achieved simply with great versatility.For the name disambiguation of authors, this paper studies relative literatures of name disambiguation detailedly, summarizes advantages and disadvantage of present methods, and proposes name disambiguation based on genetic clustering algorithms. This algorithm transforms citation clustering to multi-peak combinatorial optimization, and applies genetic algorithm to iterative solution. Every optimal solution represents a citation set of same name authors, it achieves the purpose of name disambiguation.This paper projects the system architecture of resume generation, formulates the regulation of information fusion, and generates effective, comprehensive and accurate scholar resume on the basis of information extraction and name disambiguation, and is applied in LinkScholar system.
Keywords/Search Tags:Resume generation, Information extraction, Support vector machine, Pairwise sequence alignment, Name disambiguation
PDF Full Text Request
Related items