Font Size: a A A

Research Of Patent Document Analysis And Retrieval Based On Latent Semantic Analysis

Posted on:2011-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y H XuFull Text:PDF
GTID:2178360302474640Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Patent documents contain important research results, the content of which is wide-ranging and technical details are described in detail. Patent documents are significant sources of the latest information and technology in the world. Effective analysis of the patent document can improve the competitiveness of the enterprise in market.Based on the analysis of existing patent analysis techniques in the world, this thesis studied how to use text mining technology to the Chinese patent document analysis process. Used Latent Semantic Analysis and SOM network to cluster patent documents, and developed the corresponding software platform.Currently there is no patent text corpus in China. This thesis described the procedures of automatically download patent full-text from website. Due to the special nature of patent documents, the result of using traditional Chinese word segmentation techniques on patent document is not good. This thesis designed a new algorithm to patent new word identification to improve the results.It is an effective means for patent analysis with text mining. Traditional clustering method can only apply in low-dimensional objects, but in the face dimension of high-dimensional objects such as text, clustering methods cannot get good results. This thesis use Latent Semantic Analysis to reduce the dimension of patent document and kept the original semantic space structure. Then patent documents were clustered using SOM algorithm. Experimental result showed that the clustering time of dimension reduction text is less than original text, and the clustering result was good.In conventional patent search web sites the scope of search is patent abstract and not the full text. Full-text patent search engine was developed based on Lucene. Patent full texts were indexed by inverted index structure to speed up the retrieval time. The system can rank the search result according to relevance between document and query term, which can effectively reduce the amount of patent documents for user to view and improve efficiency...
Keywords/Search Tags:Patent analysis, text mining, latent semantic analysis, patent clustering, patent retrieval
PDF Full Text Request
Related items