Font Size: a A A

Identification Of The Semi-Structured Text

Posted on:2010-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z X JiangFull Text:PDF
GTID:2178360278965689Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In daily lives, Resume is a very important text, which includes the author's information such as basic information and experience. The application of Resume is very extensive in today's society. Therefore, fast and efficient extraction of information in the resume has become an urgent demand. This article will study how to extract the resume information fast and effectively. First, text analysis more dependent on computers rather than artificial because of the Huge quantity of the Semi-structured text; second, we are able to get the accurate result according to the feature of Semi-structured text and lots of skills about text analysis, such as regulation match, relation analysis, statistics and so on.The main task of this paper is: having a deep research on effective algorithms of information extraction for Chinese Resume. The main research results are as follows: First, through research, the paper gives the characteristics of the Chinese Resume; Secondly, the paper gives effective algorithms of information extraction for various parts of the Chinese Resume; the third, giving the Chinese resume information extraction model; fourth, the paper gives the experimental results based on 1500 Chinese Resumes.From the structure of the paper's contents, the first chapter introduces the background and significance of the subject; in the second chapter, there are the introduction of semi-structured text, as well as definitions of key words; in the third chapter, automatic text classification techniques are introduced; Chapter four, give the characteristics of resume text and the model of information extraction; the fifth chapter, give the experimental results and analysis of the results; Chapter six, a summary of the work and problems.
Keywords/Search Tags:semi-structured text, elements, items, categories, collections, regular matches, statistics, segmentation
PDF Full Text Request
Related items