Font Size: a A A

The Research On Chinese Term Recognition Of Patents Based On Word-Role Tagging

Posted on:2016-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:J B HanFull Text:PDF
GTID:2309330461961715Subject:Library and Information Science
Abstract/Summary:PDF Full Text Request
Patents have been widely used in the areas of Science Technology, Manufacturing, Economics, Laws and so on. It has significance to deeply dig the serve solutions into patents. Patent term is the word with unambiguous circumscription in any patent of certain area, it reflects the main described objects in certain patents adequately and completely. Patent term digging and semantics processing can offer huge supports for deep services in patent documentations.In the current research field of Term Recognition, the research methods can be summed up under three aspects: ① term recognition based on statistics;② term recognition based on matching of language rules; ③ the method which combines statistics and language rules. As a learning algorithm with relatively high-level maturity, the "Conditional Random Fields"(CRFs) has many advantages in the area of term recognition, and because of which it has been widely used in the current research field. In certain text corpus, CRFs can test not only the original features of objects(character), but also the contextual features of these objects in the concrete text environment, which named as the "Longitudinal Features". This paper use Conditional Random Fields(CRFs) as learning algorithm, to learn the character features of Chinese patent documentations in the field of ferrous metallurgy, then automatically tag the role labels of characters in test corpus with the use of tagging model which has been created in former stage, these two processes corporately finish the operation of patent term recognition. This paper contains the following core tasks:(1) Offer normalized management on test corpus. As there is no appropriate Chinese terms table in the current field of ferrous metallurgy, some obstacles caused by "nestification of terms" and "long terms" will exist during the process of character’s role label tagging. To solve these obstacles, this research has firstly extracted the patent terms from the source of text corpus whose terms have already been recognized before. With further preparation and organization, the patent term collection has been built finally. This term collection is not only the integral summary for Chinese patent terms in the field of ferrous metallurgy, but also the reliable reference during the process of character’s role label tagging.(2) Introduce new feature items in the process of computer learning. To synthetically consider the subject features as well as content structure features of the source of text corpus, this research has introduced two new feature items based on the established research achievements: ① feature of chemical element;② feature of character frequency. Respectively, the feature of chemical element is provided to differentiate characters belong to chemical element group from ones do not belong to it. The feature of character frequency is used to differentiate term characters from non-term ones. Based on the research’s results, these two features can effectively boost the overall performance of tagging models.(3) Build the character’s role label tagging model. Based on the settings of feature items, this paper has built five different feature templates to test the influence generated by different groups of feature items for character’s role label tagging models. This research uses the latest version of CRF++0.58 as the algorithm processing platform, it uses "Conditional Random Fields"(CRFs) as computer learning algorithm in the test corpus and to build five different tagging models with their corresponding feature templates. Then, it uses classical evaluation indexes(Accuracy, Recall Rate, F-Value) and additional index(recall rate of characters’ role tagging) to comparatively analyze the five tagging models and their tagging results.(4) Summarize the influences of different feature items and empirical laws for setting feature items. Based on the test results of five tagging models, this paper tries to analyze the influences of different feature items for the tagging model’s ability of term recognition, the starting point is the semantic attributes of the source of text. In the last part of this paper, it summarizes the common principles for setting feature items and offers ways to improve experiment in the further researches.
Keywords/Search Tags:Natural Language Processing, Term Recognition, Word-Role Tagging, Patent, Conditional Random Fields
PDF Full Text Request
Related items