Font Size: a A A

Conditional Random Fields Model Based Musical Domain Named Entity Recognition

Posted on:2013-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:L C HaoFull Text:PDF
GTID:2268330392969574Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of the amount of music, people are very urgent tofind some ways to implement the automation extraction, identification andclassification of the music text information. The first key task of informationextraction is named entity recognition(NER). Named entity recognition play animportant role in the practice of application of natural language processingtechnology, and it is the important basic tool for information extraction and othernatural language processing modes. So the musical domain NER is a very importantand meaningful research project.The thesis is the named entity recognition based on conditional randomfields(CRF) in music domain. In this thesis, it mainly needs four kinds of nameentities, they are: artist names, band names, song names and album names. Thebiggest advantage of conditional random fields is great flexibility, when it processeskinds of random, non-independent inputing features. And it solves the problem oflabel bias very well. So this thesis uses the conditional random field model CRF forname entity recognition system in the domain of music.The primary task of name entity recognition is to build corpus. The content ofthe corpus in this thesis are primarily caught from Sohu music, Sina music and othermusic portal webs and got by web mining method. And after web cleaning,segmentation, label, classification and preprocessing, and other steps, the work ofcorpus catching is completed. In the domain of music there is not been tidied corpus,so the label work in this thesis is particularly complicated. In the data preparationprocess of name entity recognition, a variety of music dictionaries also need to becollected. And they are used to add attribute column into the feature files in thepretreatment and carry out the dictionary feature matching.Similarly with the named entity recognition in other domains, selection offeature function and feature template has a great influence on the results of therecognition in musical domain name entity recognition. It is also the difficulties ofname entity recognition. In this thesis, the basic features, prefix and suffix features,dictionary features and composite features for artist name, band name, album name,and song name were extracted. And the thesis describes the process of featureselection and feature templates building. And using different characteristics, wecompare the performance of systems.This thesis also gives framework of name entity recognition system in thedomain of music. In the experiments, the higher accuracy of the experimental resultsare obtained. And the thesis also compares our system with other models.The experimental results show that CRF can be better applied to NE recognition in thefield of music. And from the results compared with other models, CRF has somecertain advantages.
Keywords/Search Tags:name entity, CRF, music, feature extraction, label
PDF Full Text Request
Related items