Font Size: a A A

Biomedical Named Entity Recognition And Classification Of Biomedical Literature

Posted on:2014-05-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z F DouFull Text:PDF
GTID:1268330398998885Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the growth of biomedical literature, it is more and moreimportant to develop automatic text mining tool, for example, classifying massbiomedical literature, recognizing interesting named entity from text, extracting therelationship between those named entities, etc. Biomedical named entity recognitionfrom biomedical literature is the basic part of all biomedical texting mining, also is theprimary procedure to transform unstructured data to structured data. This dissertation isfocused on the key technologies in biomedical named entity recognition andclassification of biomedical literature, and all major contributions made by author areoutlined as follows:1. Features selection method based on improved binary particle swarm optimizer isstudied. Binary particle swarm optimizer is one of discrete particle swarm optimizer.Different with traditional real-number particle swarm optimizer, the value of solution ofbinary particle swarm optimizer is1or0instead of real number. The feature selectionalgorithm based on improved binary particle swarm evolves by round angle, andsearches for the best binary solution of fitness function in multi-dimension space untilget the best weight vector of features. The features with weight as1will be selected andfeatures with weight as0will be removed.2. Feature selection method based on membrane particle swarm optimizer isstudied. Utilizing the hierarchy structure and massage passing mechanism of membranesystem, membrane particle swarm optimizer assigns particle swarms optimizer to everysub-region. Different with traditional particle swarm optimizer, this dissertationproposes the local velocity and global velocity. All particle swarms in external regionssearch for local best solution in local velocity, and all particle swarms in the innermostregion search for global best solution in global velocity. The best solution in externalregion is passed to adjacent inner region, and the worst solution in inner regions ispassed to adjacent external region. The worst solution in the innermost region is passedto its adjacent external region. Once solution passing stops or iteration runs up tolimitation, iteration of algorithm is stopped and the best solution in the innermost regionis taken as output. We use membrane particle swarm optimizer to search for bestsolution of fitness function and get the best weight vector of features. According to thevalues in best weight vector, those features with weight less than threshold value areremoved and features with weight more than threshold value are selected in order toeliminate redundant features. 3. Parameter estimation of conditional random field model is studied. Aimed tosolve the over fitting issues in traditional parameter estimation of conditional randomfields, we propose an improved particle swarm optimizer algorithm and apply thisalgorithm to estimate parameters of conditional random fields. In improved particleswarm optimizer, aggregation degree of particle swarm is utilized to control early localconvergence of particle swarm optimizer, the relative change ratio of log-likelihoodbetween iterations is employed to end its iterations, and the inertia factor and learningfactor are set as linear variables to control search scope. This algorithm has better globalsearch ability in early stage, and better local search ability in later stage than traditionalparticle swarm optimizer. Once the relative change ratio of log-likelihood betweeniterations is less than threshold or the iteration runs up to limitation, iteration is stopped.We set logarithm estimation of conditional random fields as object function, trainconditional random fields using improved particle swarm optimizer, and search for thebest parameters which maximize the object function.4. Biomedical named entity recognition in biomedical literature based onconditional random fields is studied. Aimed to solve label bias problem in Markovmodel, we utilize conditional random fields with rich features to recognize biomedicalnamed entity. We select features using improved binary particle swarm optimizer firstly,train conditional random fields using improved particle swarm optimizer, and thenrecognize biomedical named entity using trained conditional random fields with richfeature sets, finally, label all biomedical named entities in biomedical literature.5. Classification of biomedical literature based on extenics classifier is studied.Aimed to classify mass biomedical literature automatically, we propose a novelclassification method named extenics classifier. In extenics classifier, single literature ispresented by space vector model, category model is presented by extenics matrix,extenics similarities between the literature and all category models are calculated andthe literature is classified to that category with the maximum extenics similarity. Inorder to maximize the distance between all category models, extenics matrix is trainedusing improved particle swarm optimizer.
Keywords/Search Tags:Conditional Random Fields, Particle Swarm Optimizer, Membrane System, Extenics Classifier, Entity Recognition
PDF Full Text Request
Related items