Font Size: a A A

Semantic Annotation For Documents In Professional Domain Based On NLP

Posted on:2020-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:J N LiuFull Text:PDF
GTID:2428330596476769Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the popularity of computers and the rapid development of the information society,the number of resources on the Internet has increased in a geometric progression.Document is still a mainstream method of knowledge representation today,including books and documents in various formats in the computer.How to acquire the knowledge you need from the large amounts of unstructured documents in a fast and efficient way has become one of the current research hotspots.With the rise of the concept of Semantic Web,research on the acquisition of knowledge not only aims at the document itself,but has begun to turn to the semantics of content.There exist many semantic annotation methods,but almost rely on existing ontology libraries(WordNet etc.)and only annotate by general concept,lacking domain support.For the shortcomings of existing semantic annotation methods,this thesis takes the film field as the research object,and proposes a domain semantic annotation method based on natural language processing and implements a prototype system.The method consists of two parts: the first part is the construction of domain ontology library based on natural language processing method,and the second is the semantic annotation based on self-built domain ontology library.The construction of domain ontology library is mainly divided into three parts: semantic model construction,triple extraction and standardized expression formation.The construction of semantic model is mainly based on corpus to construct domain ontology database,which provides support for all the following work.In the process of triplet extraction,the method based on dependency syntax analysis and rule-based method is mainly used.On the basis of syntactic analysis,combined with the requirements,specify the pattern and rules to extract the triples;the normalized representation mainly uses the most common OWL method to represent the ontology.The semantic annotation method based on the self-built domain ontology library mainly involves the calculation of semantic similarity and the formulation of annotation rules.In addition,on the basis of analyzing the advantages and disadvantages of existing methods of semantic similarity calculation,this thesis proposes a semantic similarity calculation method combined with the co-occurrence frequency,and combines the data index retrieval function of the database and related rules to realize the annotation to corpus by instance and concept together.The experimental results show that the proposed method achieves good results.The experimental data comes from Douban and Baidu Encyclopedia.By comparing with the classical semantic annotation method,the proposed method labeling rate and f-score value are correspondingly improved.Plus,the results of the annotation of the corpus are also more detailed.
Keywords/Search Tags:Semantic Web, Semantic Annotation, Natural Language Processing, Triple Extraction, Semantic Similarity
PDF Full Text Request
Related items