Font Size: a A A

The Research On Chinese Entity Set Automatic Expansion Technology

Posted on:2015-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:X LiuFull Text:PDF
GTID:2298330422983431Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The research on Chinese entity set automatic expansion technology has alreadydeveloped from the task of the traditional districted categories and districted domainto automatic expansion of the open categories and open domain. The requirement ofthe scientific research domain and the application put forward more requirements tothe entity set extension. The current popular methods are based on template, becausethe seeds have characteristic of polysemy and the semantic vagueness, templateapproach is only using the context information, this lead that the result contains a lotof noise. The most important thing is that template approach only use theenvironmental characteristic of the seeds, but do not combine the semanticcharacteristic. So we need to study a kind of set entity expansion method that ishigh-efficient, low-complexity and combining semantic information, so that we cangain more entities of the semantic class from a large corpus quickly and precisely tomeet the need of the domain of scientific research and application. In this paper, ourresearch work include:On the basis of template approach, we use method of the rectangular coordinatesystem, area of quadrangle and similarity of language environment to filtrate thecandidate set, we expect to get data of high quality. Experiments result show that thecomplexity of the algorithm is low, the algorithm is more effective.Due to no combination with the semantic information, the traditional method cannot describe all the characteristic of seed completely. In this paper, we use Entry labelof Baidupedia as the semantic information of seeds and combine similarity to filterthe candidate set, we hope to acquire the high quality candidate set. Experimentsresult show that complexity of the algorithm is low, the precision and recall rate,F-score of data is high.Finally, we summarize the existing research work and point out the direction ofresearch in the future.
Keywords/Search Tags:Chinese entity set automatic expansion, semantic characteristic, rectangular coordinate system, similarity of language environment, Baidupedia
PDF Full Text Request
Related items