Font Size: a A A

Research On Some Key Issues For Semantic Web Usage Mining

Posted on:2010-11-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:M SunFull Text:PDF
GTID:1488302750450294Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, Web data is exploding incredibly. Web usage mining is the process of applying data mining techniques to the discovery of usage patterns from a huge amount of Web data. Most data on the Web is unstructured or semi-structured, thus it can not be understood and processed by intelligent software agents, which causes the results of traditional Web usage mining not to be always entirely as desired. The Semantic Web is an extension of the current Web in which information is given well-defined meaning. It can help computers to process Web information automatically and improve the results of Web usage mining effectively. Therefore, Semantic Web usage mining has become one of the front fields in Web mining.Semantic Web usage mining extracts the usage knowledge from current Web data to promote the construction of the Semantic Web; on the other hand, it also improves the results and efficiency of traditional Web usage mining by making use of the Semantic Web data. In this dissertation, we review the past progress and important achievements in the research of Semantic Web usage mining, and illustrate its significance for promoting the development of the Web techniques. We survey the tasks of Semantic Web usage mining as well as point out the main problems of current research in this field. From the view of the (semi-)automatically construction of the semantic usage knowledge and mining the Semantic Web usage, this dissertation presents some key issues of log ontology learning and log ontology mining, and results in following innovative achievements:(1) Proposing the hierarchy architecture of log ontology. Event is adopted as the core concept to describe the behavior of user-visitation. From top to bottom, the formalized definitions of core log ontology, application log ontology and semantic log are completely given according to the semantics of user-visitation. Compared with related works, this kind of architecture is advantageous to express the semantics of usage knowledge at different levels, and can improve the results and efficiency of the consequent Semantic Web usage mining. (2) Proposing an approach for application log ontology learning based on Web content and usage mining. In this method, the main elements of application log ontology are determined in turn by atom application events extraction, the taxonomy of atom application events learning, complex application events mining and the non-taxonomy domain relations of application events learning. Based on the top-level architecture of log ontology, the user's requests are mapped to content application events or service application events according to the goal of the user's visitation. The taxonomy of content application events can be discovered by swarm intelligence clustering for web document, and the taxonomy of service application events can be discovered by classifying the semantics of request parameters among the visiting path. By constructing the transaction space based on the part-whole relation between events, the non-taxonomy domain relations can be mined through hierarchy association rules. The experimental results show that both the precision and the recall of our method are better than the main ontology learning tools in Web usage domain.(3) Presenting the hybrid DatalogSHIQ log ontology knowledge system, based on which an approach for discovering the frequent Web access patterns is proposed. DatalogSHIQ, expanded from AL—log, supports richer description logic language and hybrid Datalog rules. It adopts the most general Datalog safeness to strengthen the expressivity. Application access rules are applied to represent the dynamic semantics of Web usage information, which can make up for the insufficiency of log ontologies about the expression of dynamic knowledge. The atom refinement operator based on DatalogSHIQ is proposed to generate more expressive candidate patterns. An ILP algorithm based on coverage testing about observations is developed to select frequent Web access patterns from candidates. Compared with related works, this method extends the ability for reasoning complex concepts and the independent roles, the results are richer and can satisfy the needs of practical application.(4) Proposing an approach for discovering the frequent Web access patterns and association rules with DL-safe rules. Based on log ontologies, the hybrid rules language DL-safeL is given to describe the application access rules with disjunctive forms. Based on trie tree, a node explanation algorithm is presented to directly generate the frequent Web access patterns and association rules by computing admissible predicates. This method ingeniously makes use of the optimized principle in disjunctive database, and check whether a pattern is semantically free or taxonomy redundancy to avoid the algorithm performance bottleneck caused by too much logic reasoning. Compared with SEMINTEC, the experimental results show that this method supports the coverage testing about application rules and observations, and supports patterns with Datalog atoms without exacerbating the complexity of computation.
Keywords/Search Tags:Semantic Web mining, hierarchy architecture of log ontology, application log ontology learning, log ontology mining, frequent Web access pattern discovery
PDF Full Text Request
Related items