Analysis, Synthesis, and Perception of Musical Sounds - CERN Document Server
Just like the timbre classification task, we used the cortical model augmented with Gaussian Kernels. In order to optimize the model to the test data, we employed a variation of the Gaussian kernel that performs an optimized feature embedding on every data dimension. The kernel is defined as follows:. We define an objective function that optimizes the correlation between the human perceptual distances and the distances in the embedded space.
Similarly is the variance of the human perceived distances.
[PDF Download] Analysis Synthesis and Perception of Musical Sounds: The Sound of Music (Modern
We used a gradient ascent algorithm to learn which optimize the objective function. The correlation analysis employed the same dataset used for the human psychophysical experiment described above. Each note was 0. The absolute value of the model output was derived for each note and averaged over duration following a similar procedure as the timbre classification described above.
The cortical features obtained for the three notes A3, D4, G 4 were averaged for each instrument i to obtain. Similarly the perceived human distances between instrument i and j were obtained by averaging the i,j th and j,i th entry in the human distance matrix over all the 3 notes to obtain D i,j. Finally, the human and model similarity matrices were compared using the Pearson's correlation metric. In order to avoid overestimating the correlation between the two matrices the two symmetric values appearing twice in the correlation , we correlated only the upper triangle of each matrix.
As is the case with any classification problem in high-dimensional spaces, all analyses above had to be performed on a reduced number of features which we obtained using tensor singular value decomposition TSVD , as described earlier.
Analysis, Synthesis, and Perception of Musical Sounds
This step is necessary in order to avoid the curse of dimensionality which reduces the predictive power of the classifier as the dimensionality increases . The analysis comparing the correlation between the cortical model and human judgments of timbre similarity is shown in Figure 8.
The analysis led to the choice of dimensions as near optimal. It is important to note that our tests were not fine-grained enough in order to determine the exact point of optimality. Moreover, this choice is only valid with regards to the data at hand and classifier used in this study, namely a support vector machine. If one were to choose a different classifier, the optimal reduced dimensionality may be different.
It is merely a number that reflects the tradeoff between keeping a rich dimensionality that captures the diversity of the data; while reducing the dimensionality in order to fit the predictive power of the classifier. To further emphasize this point, we ran a second analysis contrasting the system performance with the full cortical model joint spectro-temporal modulations against a model with separable modulations; all while maintaining the dimensionality of the reduced space fixed.
This experiment Figure 8 — red curve confirmed that the original space indeed biases the system performance, irrespective of the size of reduced data. Results from Table 2 are also overlaid in the same figure for ease of comparison. The auditory spectrum was obtained by analyzing the input waveform with the cochlear filters described above, and integrating over the time dimension. Unlike a simple Fourier analysis of the signal, the cochlear filtering stage operated on a logarithmic axis with highly asymmetric filters.
For an input spectrogram , the response of each rate filter RF and scale Filter SF was obtained separately as follows:.
Unlike the analysis given in Equation 1 , the spectral and temporal modulations were derived separately using one-dimensional complex-valued filters either along time or along frequency axis. The dimensionality was then reduced to using tensor singular value decomposition retaining For this experiment, we used the and from Equations 7 — 8 , and integrated the output over time and frequency for each note. The resulting rate and scale responses were then stacked together to form the feature vector.
In this experiment we aimed to make the dimensionality of the cortical model comparable to the separable model by under sampling the rate, scale and frequency axes. The auditory spectrogram was down sampled along the frequency axis by a factor of 2. This auditory spectrogram representation was then analyzed by 6 spectral filters with characteristic scales [0.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. National Center for Biotechnology Information , U. PLoS Comput Biol. Published online Nov 1. Frederic E. Theunissen, Editor. Author information Article notes Copyright and License information Disclaimer. Received Mar 23; Accepted Sep This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
This article has been corrected. This article has been cited by other articles in PMC. Abstract Timbre is the attribute of sound that allows humans and other animals to distinguish among different sound sources. Author Summary Music is a complex acoustic experience that we often take for granted. Introduction A fundamental role of auditory perception is to infer the likely source of a sound; for instance to identify an animal in a dark forest, or to recognize a familiar voice on the phone.
Results Cortical processing of complex musical sounds Responses in primary auditory cortex A1 exhibit rich selectivity that extends beyond the tonotopy observed in the auditory nerve. Open in a separate window. Figure 1.
- CONTINUUM: Roleplaying in The Yet.
- Music in Our Ears: The Biological Bases of Musical Timbre Perception.
- Account Options;
- Read Analysis Synthesis and Perception of Musical Sounds: The Sound of Music (Modern Acoustics.
- Fasting: Spiritual Freedom Beyond Our Appetites?
Neurophysiological receptive fields. Figure 2. Schematic of the timbre recognition model. Table 1 Classification performance for the different models. The cortical model Despite the encouraging results obtained using cortical receptive fields, the classification based on neurophysiological recordings was hampered by various shortcomings including recording noise and other experimental constraints.
Figure 3. Spectro-temporal modulation profiles highlighting timbre differences between piano and violin notes. Musical timbre classification Several computational models were compared in the same classification task analysis of the database of musical instruments as described earlier with real neurophysiological data.
Music in Our Ears: The Biological Bases of Musical Timbre Perception
Figure 4. The confusion matrix for instrument classification using the auditory spectrum. Figure 5. The average KL divergence between support vectors of instruments belonging to different broad classes. Figure 6. Human listener's judgment of musical timbre similarity. Comparison with standard classification algorithms Spectral features have been extensively used for tasks of musical timbre classification of isolated notes, solo performances or even multi-instrument recordings.
Psychophysics timbre judgments Given the ability of the cortical model to capture the diversity of musical timbre across a wide range of instruments in a classification task, we next explored how well the cortical representation from both real and model neurons does in capturing human perceptual judgments of distance in the musical timbre space. Human vs. Figure 7. Model musical timbre similarity. Table 2 Correlation coefficients for different feature sets. L2 on features L2 on reduced features Gaussian kernels on reduced features Fourier-based Spectrum - - 0.
Figure 8. Correlation between human and model similarity matrices as a function of reduced feature dimensionality. Discussion This study demonstrates that perception of musical timbre could be effectively based on neural activations patterns that sounds evoke at the level of primary auditory cortex.
Procedure Subjective similarity ratings were collected. Multidimensional scaling MDS and acoustical correlates To compare the results with previous studies, we ran an MDS analysis on the dissimilarity matrix obtained from human judgments. Auditory model The cortical model is comprised of two main stages: an early stage mimicking peripheral processing up to the level of the midbrain, and a central stage capturing processing in primary auditory cortex A1.
Read Analysis Synthesis and Perception of Musical Sounds: The Sound of Music (Modern Acoustics
Cortical receptive fields Data used here was collected in the context of a number of studies  —  and full details of the experimental paradigm are described in these publications. Timbre classification In order to test the cortical representation's ability to discriminate between different musical instruments, we augmented the basic auditory model with a statistical clustering model based on support vector machines SVM .
The kernel used here is given by. Analysis of support vector distribution In order to better understand the mapping of the different notes in the high-dimensional space used to classify them, we performed a closer analysis of the support vectors for each instrument pair i and j.
reifilamheart.ga Dataset We used the RWC music database  for testing the model. Dimensionality reduction of cortical features As is the case with any classification problem in high-dimensional spaces, all analyses above had to be performed on a reduced number of features which we obtained using tensor singular value decomposition TSVD , as described earlier.
Control experiments i Auditory spectrum analysis The auditory spectrum was obtained by analyzing the input waveform with the cochlear filters described above, and integrating over the time dimension. References 1. Handel S Listening: An introduction to the perception of auditory events. Ansi PT Psychoacoustical Terminology. Helmholtz H On the Sensations of Tone.
New York: Dover Publications. J Acoust Soc Am 63 : — J Acoust Soc Am : — Patterson RD The sound of a sinusoid: Time-interval models.