Linguistic validation of automatic subtopic segmentation

Posted on:2005-08-08

Degree:Ph.D

Type:Dissertation

University:Boston University

Candidate:Saidi, Aisha F

Full Text:PDF

GTID:1458390008490433

Subject:Language

Abstract/Summary:

This study evaluates a technique for automatically segmenting medical history paragraphs by subtopic with the view that subtopic language models could be created in order to improve speech recognition of the medical history sections of medical reports. The technique uses a Hidden Markov Model segmenting tool to mark boundaries of hypothesized subtopic segments within each text. Since the tool is built on the assumption that the input texts have a similar topic structure, it can be used to segment medical histories, which generally have a three part structure.; The data for this study is a set of medical histories extracted from 2,700 orthopedic medical reports. The study is carried out in four broad steps. First, a group of linguists independently mark the subtopic structure of a test set of medical histories; histories upon which there is significant agreement become the standard by which the automatic segmenter is evaluated. Next, the automatic segmenter is trained on a large set of histories. Then, using the statistical information built from the training data, the automatic segmenter marks subtopic segments in the test data. Finally, the automatic segmentation of the test data is graded against the evaluation standard developed by the expert subjects.; Two types of subtopic segmentation are explored in this work. The first type, linear subtopic segmentation, assumes that each of the three subtopics in a medical history is a continuous chunk of text within the paragraph, uninterrupted by other subtopics. Despite the relatively homogenous structure of medical histories, this model is found to be linguistically unrealistic, and the performance of the automatic segmenter is poor compared to the evaluation standard. The second type, non-linear subtopic segmentation, allows each sentence to be assigned to a subtopic regardless of order. Because of the variability of the data, the tool is unable to successfully distinguish three subtopics in the histories. However, the automatic segmentation of two non-linear subtopics for each medical history is successful, with a high rate of accuracy compared to the human standard.

Keywords/Search Tags:

Subtopic, Automatic, Medical, Standard

Related items

1	The System Design Of Electronic Medical Record And The Exchange Of Electronic Medical Record
2	Research On Subtopic Mining For Diversified Information Retrieval
3	Research On Construction Of Electronic Medical Records And Correlative Technology Based On Standard HL7
4	Study On Medical Image Displaying And Processing Based On Dicom Standard
5	Coding And Decoding: The Research About Image Construction Of The Subtopic In People’s Livelihood News
6	An Automatic Approach Towards Constructing Chinese Medical Terminology Resource
7	Research On The Construction Of Data Standand System Framework Of Health And Medical Wearable Device From The Perspective Of Ecosystem
8	Research And Application Of Automatic Test Case Generation Method For Data Oriented Standard
9	Research And Implementation Of Low-power Standard Cell Library For Implantable Medical CPU
10	Research On Medical Images System Based On DICOM Standard