Font Size: a A A

Study On Content-based Music Recognition

Posted on:2010-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q TangFull Text:PDF
GTID:2178360272995845Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays, multimedia such as image, audio and video content has been the key component of modern information transmitted. The request for multimedia search has been more and more high demand in modern information retrieval.There is a clear difference between multimedia retrieval and traditional information retrieval. The structure of multimedia data is more complicated than the traditional text data. Usually, the way of input and output of multimedia information retrieval are multi-modal which based on fuzzy query. Compared with the traditional text-based information, more extensive applications of these multimedia resources such as images, audio and video make their structures more complicated. Based on this situation, there is a rapid development of the research of multimedia retrieval in order to make it more efficient, especially the aspect of music retrieval.But the most widely-used methods of music retrieval are still highly dependent on text information such as the name of music, author, artist which are manual marking. Yet with the rapid growth of music resources, the traditional text-based search methods can not satisfy the substantial demand for music retrieval because their limitation on accuracy or wrong tag. So, there is the request for content-based music recognition. Through the analysis of the tune features of music, we can get the acoustic characteristics of the audio data. Based on these acoustic characteristics, we can calculate the similarity between the two songs or even between the singing and the songs to find the right music.The way of music retrieval by singing is a branch of content-based music recognition. In this method, we have to think more about the uncertainties in singing, such as the incorrect of rhythm or tune. These mistakes can lead to the wrong results on music retrieval. There is a need to think about the amateur and the randomness of the singing in order to find the most suitable expression of tune with fuzzy matching mechanism to ensure the accuracy of the music retrieval.However, because of the characteristics of audio coding, the data in original audio only including the sampling frequency and coding information which is merely a symbol of non-semantic and unstructured binary data flow that lack of semantic description of the content and structure of organizations. This situation has greatly increased the difficulty of realizing content-based music retrieval. The typical process of content-based music retrieval usually including music automatic classification, singing input, feature extraction and similarity comparison which all based on the acoustic feature of the source music and the singing itself. Now, let's take a look at the research work about these steps in this article:1)Study on Feature ExtractionThere are two parts of work in this content. They are the time-domain feature extraction and the frequency-domain feature extraction.Time-domain feature is mainly used to extract the voice segmentation in order to reduce the interference data that decrease the efficient of the retrieval system. In this paper, the method used in detecting the start point and the endpoint of voice segmentation using the short-term zero-crossing rate and the short-term energy frequency value. After experimental statistics, there are about 70 percent of the pure music segments can be filtered out and 80 percent of the voice segments can be detected which based on the 310 songs in this experiment. Especially nearly every climax segment of the song can be extracted completely which is the more reliable part in singer's memory and also be sung in a higher frequency that will help the system achieve the better results.Frequency-domain feature is mainly used to extract the melody of the music which is the most important acoustic characteristic. Base on the principle of short-time frame and frequency of component model, the original time-domain audio data from the feature map by their frequency domain characteristics and finally get that tune the characteristics of the pitch through the time-domain feature extraction of audio data after the hamming window and deal with discrete Fourier transform. Acoustic studies have shown that this method of treatment for the audio signals can get better results.2)Study on Acoustic Characteristic ExpressionBecause the acoustic characteristics such as the musical instruments, styles, pitch, rhythm, duration and other characteristics of music may have great differences between the music and singing, it is difficulty for similarity matching using the acoustic characteristic extracted directly from the music and the singing. Therefore, we need not only to find out an efficient way to express these characteristics to make sure that they can be used to match, but also easy to storage and organization. How to effectively express the acoustic characteristics and find the relationship between singing and music is very crucial. We use digital sequence which based on melody to express the characteristics of music, and then made a quantitative descriptor interval formula used to extract the fundamental frequency to fuzzy quantified. In the case of that, we made the basic pitch is similar to a different tune comparability that enhance the treatment capacity of the fuzzy and made a good foundation for the next step of fuzzy matching.3)Study on Fuzzy MatchThis part of job is the focus of this paper. How to choose an effective matching strategy will directly affect the performance of this system. We use an improved dynamic programming algorithm to solve the tune similarity matching problem. For example, we have two short-term data sequence of tune as 123456 and 234567. In face, they are high similarity in tune matching process but not in the traditional dynamic programming algorithm. In this paper, we have had an improved dynamic programming algorithm which can be a very good solution to this problem. Finally, we use the number of the optimal matching sequence of singing in each music to select the best candidate as the ultimate song and return results to users.In sum, according to the practical requirements of melody based multi-media retrieval, this article adopts to use audio properties of frequency components model to form algorithm, in order to enhance the accuracy of melody fetching. The application of frequency components model can solve the problem which human singing can not be compatible to the resource music directly. Meanwhile, this article reveals both melody similarity matching algorithm and relevant music retrieval voting algorithm, these achieve the similarity matching of human singing and resource music segment, as well as the more efficient selection of chosen music. The research results together assure the accuracy of outcomes of the systematic trials and the real-time feedback. Better effectiveness of these trials is apparently gained. Further more, feasibility and effectiveness of melody similarity matching algorithm and strategy of similarity matching mentioned in this article has also been proved.It can be said that what we have done in this paper have laid a good foundation and a positive significance for further in-depth study on the field of melody-based music retrieval.
Keywords/Search Tags:Query by singing, Melody, Fundamental frequency, Dynamic Programming, Fuzzy match
PDF Full Text Request
Related items