Font Size: a A A

Web Text Mining Based On Description Logic

Posted on:2020-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:H FuFull Text:PDF
GTID:2428330572478659Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of AI(Artificial Intelligence)in recent years,the research on the underlying technology of Description Logics(DLs)has become a research hotspot.In fact,Description logic has not only made achievements in Artificial Intelligence,but also has been applied in agricultural astronomy,genetic engineering,information security,energy management,earth science,machinery and other fields.Especially under the OWL2 standard,OWL can make up for the deficiency of OWL standard and promote substantial development of Web ontology language.Meanwhile Web development is quite rapid also,according to the China Internet network information center(CNNIC),statistic reports,as of June 2018 the number of websites in China has reached 5.44 million the scale of the site to give accurate search and Latent Semantic Web text content(Latent Semantic)found that bring no small pressure in order to solve the problems of the potential relationship between data processing on the Web,introducing description logic in the process of Web text mining is used for knowledge representation.Web text mining process is divided into three steps : Web data preprocessing;Web text mining;This paper focuses on Web text mining and result evaluation.Due to the complexity of Web pages,it is embodied in its unstructured data form.In the early stage,simple data processing technology can be adopted to delete the video information of sound and image and only retain the text data.Clustering and classification were introduced in this paper two kinds of Web text mining technology and their similarity calculation and choose HTML text set for the specified results evaluation of commonly used F-Score calculation method for the description logic reasoning,this paper introduces a classification algorithm based on ontology concept of Pellet,its reliance on the description logic has strong ability of expression.In addition,this paper also proposes a hierarchical clustering computing method based on HTML path,namely Path HP algorithm,which can realize Web text clustering.In this paper,relevant theories and technologies are searched by means of literature research and analyzed and sorted out in the form of comparison,so as to find technical breakthroughs.Since XML format data plays a very important role in Web knowledge management and storage,HTML text is converted into XML format in the process of knowledge base construction.The traditional clustering method has the problem of weak explanation of clustering,or there is no explanation of clustering results.The description logic is used to represent the knowledge in the process of Web mining,which can correlate the data between tag data and files,and finally obtain the benefit of reducing the data dimension and correlation in the cluster.Then,in the experiment,XML Schema is selected to describe the structure of Web text,and ALCIF description logic is used to represent it,and it is stored in the knowledge base as the carrier of Web text information to reduce the text with inclusion relations.Finally,k-means ++ algorithm is used to cluster and the clustering result is drawn through the Python tool kit.Experiments show that descriptive logic can reduce the dimensionality of Web text data and discover the potential semantic relationship,which can improve the efficiency of data clustering of descriptive logic knowledge base and the interpretability of clustering results.
Keywords/Search Tags:Description logic, Web documents, Mining, Knowledge base
PDF Full Text Request
Related items