Today, we are living in a complex and changeable data era, the data is continual growing from varios fields, and subtle changing our life’s pace and way. In these data,time series is an important data that can affect our life by time, and analysing the time series become also important. But the factors of dimensionality and disorder make the research process is difficult, that is attracting scholars using the analysis method of data mining to finding embedded and important information. At the arithmetic of data mining, the similarity search of time series is an important posotion that is attracting majority researchers to research.With time going, the methods and achievements is widely exist every fields of in real life, such as the investment and strategic of bond,prospect and forecast of earthquake, and the medical treatment, and so on.In the time series similarity search, the morphological character is the main feature of sequence. It can reflect the changing trend of whole time series, and embody the changing feature of detail. At the same time, the distance arithmetic based on morphological characters can calculate the different in characters, and is a far influence in technology of similarity search.Through reading and studying a lot of literatures in the field of time series similarity search based on morphological character method at home and abroad. This thesis is fully and detail summarize the algorithm’s current state of the time series similarity search based on morphological character, and has a detail analysis and describe on the various stages, pointing out the problems, and putting forward the related solutions on this basis. This main research work includes:(1) The arithmetic coding technology is applied to time series similarity search in this thesis. Firstly, the method is extracting the time series’ feature informations that is mean and slope informations by the key point technology, and get the mean and morphology sequence by digital symbols that can sufficient embody informations of sequences. And then throughing the coding technology change the symbolic sequences to coding sequences, that is realized the pattern representation in probability interval. At last, using the hierarchical search algorithm based on Euclidean distance to calculating the similarity, and achieve the overall trend match and fitting in detail by from coarse to fine method.(2)The compositely search based on symbolic aggregate approximation(SAX)and angular point bending value. This method is using angular point to geting the mean and angular point of sequences, and the two-tuple is composed by values of mean and angular point to representation the sequences. On this basis, the algorithm of composite distance with high quality is using in similarity search, and the results set can achieve the similarity goal in two aspects of numeric and morphology. |