Font Size: a A A

Research On CSCW And Data Mining In The Internet Environment

Posted on:2005-08-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:1118360212984598Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
More and more information has been accumulated with the growth of the Internet, however the Internet is a complicated autonomic system that has features of low reliability, high latency, extreme heterogeneity and huge data. It is a valuable subject how to overcome its deficiencies while taking advantage of the Internet, so that we can gain more benefits from it. In this thesis we explore a systematic way to solve two practical problems of information exchange and information usage in the Internet environment with CSCW and Data Mining technologies. The work in this thesis is related to these research areas: Workflow Management System, Real-time Collaborative Editing, and Frequent Pattern Mining. Our primary work is stated as follows:1. The Knowledge-based Multi-Agent Workflow model (KMAW) and its extension are proposedThe workflow management system is an important research area in CSCW. In this thesis, we adopt a knowledge-based and target-driven pattern to compose a KMAW model with agent technologies. The model is better than traditional process models on expressive ability and can realize normal workflow patterns, even including correctness validation, exceptional process, dynamic process and other advanced workflow features. The model has also very good flexibility, so that it must be a very good framework for cooperative work and integration on the Internet. Furthermore, in this thesis we combine data mining technologies with KMAW model, and then get an extended model MKMAW, which could mining knowledge automatically and have the ability of dynamic decision and dynamically optimizing process.2. Athena cooperative platform is designed and developed based on KMAW modelAthena cooperative platform is based on web services and asynchronous messages, designed for cooperative work of multiple heterogeneous systems on the Internet. We also design Athena script as well as the engine to execute it. We develop a XML mapping engine and relative visual tools to resolve the problem of heterogeneous data. UDDI server is designed as part of Athena cooperative platform, which is extended to allow the binding between web services and event templates. A visual design tools for process building is developed which can generate Athena script automatically. Athenacooperative platform is valuable to reduce risk and cost in integration of heterogeneous systems on the Internet. The system has been applied in some National 863 project, and some phase achievements have been obtained.3. The conception of operation group and its relevant algorithms in real-time collaborative editing area are proposedReal-time collaborative editing is an important area in CSCW, and in the Internet environment it is always based on operational transformation methods because of high latency of the Internet. In this thesis a conflict problem caused by applying traditional operational transformation algorithms on Replace operation is firstly described. Then a new conception of operation group is introduced into the collaborative editing area. T-Group and S-Group are two important types of operation group. To preserve T-Groups in operational transformation new algorithms are developed based on the REDUCE approach, and some explanations and examples of the correctness of the proposed algorithms are also given. We also give a strategy to preserve S-Group in REDUCE approach. At last some further discussions about compound operation groups and locking schemes are presented.4. Two novel frequent pattern mining algorithms based on FP-tree are proposed The associated rule mining and the frequent pattern mining are effective ways toobtain knowledge from huge data. Some existed algorithms are weak in robustness and scalability when processing huge real data from the Internet. Research has found that the distinct features of different datasets greatly influence the efficiency of specific methods in frequent pattern mining. It is possible to build a robust algorithm by methodically combining different algorithms that should be properly applied according to the features of data distribution of current dataset. We firstly propose the Naive Depth First Search algorithm (NDFS) that is based on FP-tree and very efficient on dense datasets. And then, the Self-Adaptive Frequent Pattern mining algorithm (SAFP) is proposed, which combines the NDFS with the FP-growth by a dynamic mining strategy on conditional FP-trees. Experiments demonstrate that the SAFP is more robust and efficient than both the NDFS and the FP-growth on different datasets.
Keywords/Search Tags:Internet, CSCW, Data Mining, Workflow, Real-time Collaborative Editing, Operational Transformation, Operation Group, Frequent Pattern Mining, FP-tree
PDF Full Text Request
Related items