Font Size: a A A

Research On Semantic Analysis Methods For Patent Information Processing

Posted on:2020-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y LinFull Text:PDF
GTID:2428330590450949Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,China's patent applications have been extremely large and growing at a faster rate.The report shows that in 2017 alone,the number of domestic invention patent applications reached 1.382 million,an increase of 14.2% year-on-year,and 744,000 cases have been concluded.The number of unfinished patent documents is large and growing.The manual indexing and classification of patent information requires a large number of people with professional foundations.Therefore,the workload is huge and the progress is slow.The indexing and classification are prone to consistency errors,resulting in problems such as missed detection,partial inspection and noise of patent literature retrieval.The patent text is a semi-structured data that is difficult to standardize with existing data structure methods.How to extract the required technical features from the patent documents with two-dimensional features of technology and law,and analyze the technical content described in the patent literature is based on the research focus of technical language semantic analysis.The traditional text mining method based on word frequency statistics is difficult to adapt to the complex structure of patent documents,resulting in insufficient accuracy of analysis results.Therefore,the semantic analysis of technical language represented by patent documents is carried out to accurately locate and extract the technical and product features in patent documents.This paper focuses on the semantic analysis needs of patent texts,focusing on the accurate extraction of patent language features to carry out research and experiment.By constructing the patent domain ontology to obtain as much information as possible in the patent field,improve the recall and precision of patent document retrieval,and reduce the workload of manual indexing and retrieval.To this end,this paper mainly does the following research work:1.Based on the dependency of the dependency tree-CRF(Conditional Random Field)text terminology.Feature extraction based on dependency tree-CRF is a feature data selection method based on semantic analysis.The traditional text keyword mining algorithm is mainly based on the feature vector model to calculate the frequency of words appearing in the document,and it is easy to ignore some low-frequency key technical feature words.Aiming at this problem,a text feature extraction algorithm based on dependency-CRF is proposed to perform part-of-speech tagging on each word in the text,and the terminology is automatically extracted based on the feature template.2.Hierarchical relationship extraction based on improved K-MEANS clustering algorithmFor the K-MEANS algorithm to obtain the hierarchical relationship of terms,the existence of the class label problem can not be automatically determined,and the K-MEANS algorithm based on scientific statistics and hierarchical clustering is proposed to automatically obtain the best class label.In this paper,the two improved methods proposed above are tested separately.The text feature extraction method based on dependency tree-CRF can be applied to any part of the patent text.Compared with the traditional K-MEANS algorithm,based on scientific statistics and hierarchical clustering.The improved K-MEANS clustering algorithm can effectively mark the clustering tags inside the hierarchy,and it is easier to get the complete hierarchical relationship.
Keywords/Search Tags:Semantic analysis, Part-Of-Speech tagging, Dependency tree, CRF, K-MEANS
PDF Full Text Request
Related items