Font Size: a A A

Constructing Ontology For Unstructured Chinese Text

Posted on:2018-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhaiFull Text:PDF
GTID:2348330542470289Subject:Software engineering
Abstract/Summary:PDF Full Text Request
At present,the Internet has become the main channel for people to obtain information.However,the massive information of the Internet is mostly unstructured and heterogeneous,and there is a lack of semantics and uncertainty.This situation has caused great inconvenience to information management,search,extraction,maintenance and so on.At the system construction level,it has brought difficulties to information sharing,software reuse,knowledge representation and rule construction.Therefore,it is an important subject to realize the reasoning and forecasting of the knowledge management system by using the computer to acquire the knowledge acquisition,understand and finally extract the knowledge of the areas that people are interested in,and form a complete knowledge management system.The goal of domain ontology is to describe the concept from the semantic and knowledge engineering levels,to establish the relationship model between concepts,and to show them through visualization.The domain ontology realizes the conceptual description through its own rigorous logic and high abstraction,and through the close combination with the search engine and the semantic network,digs out the connotations or concepts that are not clearly expressed in the concept,and the potential Relationship,can help people to comprehensive,multi-dimensional,dynamic understanding of knowledge and organization of information resources,to promote the target area of knowledge construction and analysis.Based on the analysis of domestic and foreign ontology research,this paper studies the information acquisition method of ontology concept,the construction and organization of ontology,the logical representation and reasoning of the ontology,and the "nucleus Security " ontology.Specific work is as follows:(1)Studied the data crawl technology of unstructured text,mainly using the Python language based crawler technology,through the development of appropriate regular expressions,on the Internet to crawl a large number of nuclear data within the field;(2)Based on the study and comparison of a large number of Chinese word segmentation algorithms,this paper proposes a recognition strategy of unlisted words based on unstructured text based on the identification of unregistered words in statistical segmentation algorithm and based on commercial corpus fusion Two Strategies for Improving the Construction of Word Segmentation.The former strategy is to use the characteristics of statistical stability of the phrase sentence structure,and according to the length of the new word and the repetition of the phrase frequency to develop a repetitive phrase selection rules,through the suffix array of text repetition phrases recognition and extraction The algorithm extracts a phrase that does not exist in the dictionary from a large number of unstructured text data.This paper also uses the concept of word vector to compare the direct semantic similarity of two words.(3)The OWL-DL language and description logic are used to represent and formalize the ontology of the nuclear domain.At the same time,the ontology description tool is used to realize the ontology visualization and the corresponding query function in the nuclear domain.This essay explores the application of ontology technology in the field of "nuclear safety",which provides a valuable method for intelligence gathering and analysis in this field,which can enrich the research methods in this field.
Keywords/Search Tags:Domain ontology, Chinese word segmentation, Word vector, Visualization, Nuclear safety
PDF Full Text Request
Related items