Font Size: a A A

Design And Development Of Patent Extraction And Analysis System

Posted on:2009-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:W Q WangFull Text:PDF
GTID:2178360242983001Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Patent Information is one of the important choice of competitive intelligence resources, which contains plenty of innovation knowledge. Once patent information was fully utilized, it would be helpful to improve existing technologies and discover new technologies. However, the number of patents is increasing rapidly. It is difficult to collect and analyse patent information by hand.Although there are lots of research and software relative on Patent Extraction Analysis, most of them just focus on English invention patents, which are localized in language sorts and patent sorts. In China, there is no Patent Extraction software. The research on Patent Analysis has just taken the first step, and the Patent Unstructured Data Analysis is not provided. This paper concentrated on Chinese patents and presented the technique of Patent Extraction and Analysis.In view that the patent information is hidden behind search forms. The General Purpose Crawlers retrieve content only from the publicly indexable Web, ignoring the data behind search forms. Accordingly, in this paper, Wrapper was employed to carry out Patent Information Extraction system. RuleBase playes an crucial role in Wrapper Model. In this paper, it was created by parsing web page by hand, and according to the RuleBase, the Webbrowser control of VC# would parse web page and extract data automatically. The network instability would bring incomplete information and uncontrollable progress of extraction. These problems were solved by resolutions, such as web page update timing, documents completed event, multithread and so on. Besides, thread dispatch center was designed to deal with the problem of deadlock in the system when the threads run out of control.It is an effective means for patent analysis with text mining. So patent analysis based on text mining was studyed in this paper. The text of invention patents is usually expatiatory and hard to understand. So word processing technology was firstly studyed including methods of new words identification based on statistics and rules and new words explanation. In this process, words would be processed frequently, so an effective data structure was required. In order to improved store efficiency, Hash + Index + map (set) was selected as the data structure. Then patent specifications and patent claims were structurally processed. The characteristics of the patents were extracted based on similar-semanteme-word groups, which were taken to estimate the similarity of patents. Finally, according to the judgment of "those documents that have mass of same concepts are similar", hierarchical clustering algorithm and dynamic SOM model was adopted to cluster the patents.Finally, the Patent Extraction and Analysis System was developed. And some practical examples were presented such as lighter.
Keywords/Search Tags:Patent Extraction, Wrapper, Patent Analysis, Text Clustering, Chinese Patent
PDF Full Text Request
Related items