Font Size: a A A

Research On Patent Classification Technology Based On Latent Semantic Analysis

Posted on:2012-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y L CaiFull Text:PDF
GTID:2248330371958303Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Patent Classification can quicken the retrieval speed of patent documents and facilitate the management of them, so it plays an important role. In recent years, the development of Natual Language Processing and Information Retrieval technology provides new methodology for patent classification task, and how to choose a reasonable resolution based on the particularity of patent classification task is the key to improve the performance of classification system.Research has shown that data sparsity is always the obstacle influencing the performance of patent classification; besides, the class system of patent is a multilayer tree struct, and samples under the same parent node are very similar to each other, so patent classification becomes more difficult. Aiming at the above characteristics of patent, this paper presents a Patent Automatic Classification Technology based on Latent Semantic Analysis(LSA). This technology uses Singular Value Decomposition to deeply mine the latent relationship between the original characteristics and the document matrix by mapping co-occurrence or interrelated characteristics into the same semantic space, and projects the original high-dimension space into low-dimension semantic space by reducing dimensions, ensuring the most effective semantic relation between the original characteristics and the document matrix and compressing unuseful and unrelated noise information as much as possible, thereby ensuring the abundant semantic characteristics in k-dimensional space, so it is the effective method to solve the problem of data sparsity.Aiming at the particularity of classification task,this paper also presents an optimized method of LSA based on class information, by which the exact latent semantic space can be gained through strengthening co-occurrence degree among characteristics in the same class, making the samples of the same class more similar, so it can improve the performance of patent classification.In this paper, the Patent Automatic Classification system based on LSA is built for US patent data based on the platform of NTCIR-8 Patent Classification evaluation. Referring to the classification system which is based on shared nearest neighbor, the author experiments with the core technology of patent classification task on the main issue, analyses the experiment results in detail, and build the reliable system finally.
Keywords/Search Tags:LSA, Shared Nearest Neighbor, BM25, Patent Classification
PDF Full Text Request
Related items