Font Size: a A A

Research And Application Of Character Sequence Pattern Mining Algorithm

Posted on:2017-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:J J LiFull Text:PDF
GTID:2348330512454548Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data mining is a multi-subject intersection research field.It combines database technology,artificial intelligence,machine learning,statistics,knowledge engineering,object-oriented methods,information retrieval,high performance computing and data visualization,and other research results of the latest science and technology.After more than 20 years of careful study of many scholars,especially in twenty-first century,a lot of new concepts and new algorithms emerged.In the computer experiment teaching process,teachers instruct a greate deal of students in a lab section.In the limited class time,by inspecting and observing students' screen,teachers can find out whether browsing students have activities only nothing to do with learning operations such as playing games,internet and so on.It is difficult for the teachers to monitor all the students' behaviors in experiments from all aspects and multi-views,which requires a higher requirement for the experimental teaching.It is necessary to find some valuable patterns and rules through student's key sequences,students of the operation of the key sequences mining analysis.Sequential pattern mining is an important field of data mining research,and the objects of sequential pattern mining is mainly biological sequence,time sequence and event sequence.In this paper,the key-pressing operation sequence of the students on the machine is abstracted as a sequence of characters to research in depth on the key operation mode.The main work includes the following aspects:1)Data transformation mainly conduct the following tasks.Task 1: the keyboard code is converted into the corresponding key characters;Task 2: count the number of different users,encode the users for user ID;Task 3: encode the different time period of a user for time stamp.These tasks are completed by through the design of C language program.2)Through the research of the SPADE algorithm,we found that the SPADE algorithm has some disadvantages such as: computing user support need count through the ID_list sequence.If there are a large number of candidate sequences generated.It needs to constantly connect,then needs to spend a lot of time,reducing the efficiency of the implementation of the algorithm.3)Aimed at the shortcomings of the SPADE algorithm,this paper proposes an improved algorithm of SPADE algorithm,which is based on vertical compressed bitmap SPADE algorithm,that is,SPADE-VBmap algorithm.This paper introduces the main idea of the SPADE algorithm based on vertical compressed bitmap,through the detailed analysis of an example.4)Finally,the SPADE-VBmap algorithm based on vertical compressed bitmap is validafed,the experiment results show that the algorithm based on bitmap reduces the time complexity,improves work efficiency,achieves the intended purpose.
Keywords/Search Tags:data mining, sequential pattern mining, character sequence, SPADE algorithm, SPADE-VBmap algorithm
PDF Full Text Request
Related items