Scholar Resume Automatic Generation Based On Text Mining

Posted on:2012-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:Y Qian

Full Text:PDF

GTID:2218330368488107

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The user's resume information is the basis of structuring social network, structuring user's resume automatically will bring great convenience to construction and promotion of social website. This paper, which takes academic social website LinkScholar as application background and papers in PDF format and academic journals website as information source, obtains domestic scholar information by information extraction, applies technology of name disambiguation and generates scholar's resume.For the information extraction of Chinese paper in PDF format, this paper, by means of analyzing a large number of Chinese scientific paper, generalized Chinese paper metadata is no inclusion relative, exclusive, repetitive, sequential, and integrant deterministic, defined simple metadata and complex metadata, and proposed that dictionary matching and support vector machine model can be used for extracting Chinese scientific paper metadata. The experiment result shows that, the comprehensive performance index of this model is more than 96%, which is superior to Conditional Random Fields Model and Hidden Markov Model.For the information extraction of academic journals website, this paper proposes concepts of homologous HTML document and HTML skeleton sequence, and introduces the pairwise sequence alignment algorithm to Web information extraction on the basis of these two concepts. Applying pairwise sequence alignment algorithm to calculate the maximal same segment of candidate sequence and template sequence, this paper extracts various information items according to the template sequence tags. This method taking full advantage of the relation between structure and data in HTML document needs not construct a large number of sample databases and it can be achieved simply with great versatility.For the name disambiguation of authors, this paper studies relative literatures of name disambiguation detailedly, summarizes advantages and disadvantage of present methods, and proposes name disambiguation based on genetic clustering algorithms. This algorithm transforms citation clustering to multi-peak combinatorial optimization, and applies genetic algorithm to iterative solution. Every optimal solution represents a citation set of same name authors, it achieves the purpose of name disambiguation.This paper projects the system architecture of resume generation, formulates the regulation of information fusion, and generates effective, comprehensive and accurate scholar resume on the basis of information extraction and name disambiguation, and is applied in LinkScholar system.

Keywords/Search Tags:

Resume generation, Information extraction, Support vector machine, Pairwise sequence alignment, Name disambiguation

PDF Full Text Request

Related items

1	Biological Sequence Alignment Problem
2	Research On Pairwise Sequence Alignment Algorithms
3	The Application Of ACO And Coding Method In Sequence Analysis
4	Word Sense Disambiguation Based On Semantic And Lexical Information
5	Multiword Expressions: Extraction And Applications
6	Research And System Design Of Structured Data Extraction Method Of Resume
7	The Research Of Dynamic Web Pages Information Extraction Algorithm Based On Sequence Alignment
8	Protein Structure Classification Algorithms Based On Sequence Similarity
9	Researches On Some Problems In Nonparallel Hyperplanes Support Vector Machine And Feature Extraction
10	Application Of Linear Space Algorithm For Pairwise Sequence Alignment Based On Parallel-Computing