Font Size: a A A

Research On Resume And Srs Information Extraction Method

Posted on:2011-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y H MuFull Text:PDF
GTID:2178360308452431Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
There is such a category of information in free texts which are unstructured in content, but in essence, is following certain rules of document organization. They have analytical document structures, such as resumes, medical records and software requirement specifications (SRSs) which meet the needs of a certain writing standards.This thesis focused on the field of resume information extraction and functional requirement information extraction from software product line (SPL)'s requirement specifications. This thesis brings forward an ontology-based information extraction framework. In this framework, the concept of ontology is categorized in two kinds: entity concept and event concept. The definition of these different kinds of concepts and the relationships may exist between these concepts are given in this thesis. In the information extraction process, entity concepts, event concepts, the relationships between them and the instances are all useful. The introduction of ontology is to ensure the consistency of the structure, so that data from different sources can present a unified view, making the extracted results more accurate. In the field of resume information extraction, after a large number of textual analyses about resume, this thesis summarized an ontology model of resume. The concept of resume ontology is composed by entity concepts, and event concepts which includes education event, work event and reward event. The relationships between these concepts are also defined in this ontology. In the text pre-processing step of information extraction implementation, the instances of concepts are used to improving the accuracy of Chinese word segmentation; in the extraction of entity step, the concept of entities and the relationship between them can be used when generating extracting rules; after the extraction of entity concept, combine the event concepts of the ontology and their component entities with the entity information that has been extracted, identify and extract the event information in resumes.In the area of SPL's functional requirements, this thesis focuses to the SRSs which meet the IEEE-STD-830 standard. According to the differences between the requirement analysis in SPL and general requirement analysis, this thesis proposed an extended functional requirement framework (EFRF) to meet the variable character of requirement analysis in SPL. In the framework, each function is a variable point, and 10 semantic cases are summarized for variable point. On the basis of EFRF definition this thesis creates EFRF ontology, the entity concept of this ontology is the 10 semantic cases and separators, and the event concept is corresponds to the concept of variable point which is consist of relevant entity concepts. In the complementation of information extraction, dependency relationship analysis of Stanford Parser and the NE component of GATE framework is used. Combined with the concepts, instances and the relationships between concepts of EFRF ontology, this thesis put forward a series of transformation rules to achieve the entity information and event information extraction.
Keywords/Search Tags:information extraction, ontology, resume, SPL, functional requirement, rule-based
PDF Full Text Request
Related items