INFORMS Comuting Society Logo 9th INFORMS Computing Society Conference Wave The Next Wave in Computing, Optimization, and Decision Technologies  

TWO INVITED SESSIONS ON
Music, Computation and Artificial Intelligence

Organizer: Elaine Chew
University of Southern California

Computational algorithms for processing, analyzing and understanding music are fast gaining importance due to the rise of digital music applications ranging from music information retrieval to computer gaming. These two special sessions are devoted to recent advances at the intersection of music, computation and artificial intelligence. The first session presents computational algorithms for rhythm classification and key finding. The second session offers automatic methods for score following, musical accompaniment and improvisation. Operations researchers will find that the work represented employ modeling techniques near and dear to the heart of the community.

SESSION 1 (TB02). Music, Computation and AI I -- Algorithms for Rhythm and Key Recognition
Thursday, Jan 6, 2005, 10:30AM-12:00PM

    Dance Music Classification Using Inner Metric Analysis
    Elaine Chew, Anja Volk & Chia-Ying Lee
    University of Southern California Viterbi School of Engineering
    Integrated Media Systems Center, Los Angeles, CA 90089, USA
        This paper introduces a method for music genre classification using a computational model for Inner Metric Analysis. Prior classification methods focusing on temporal features utilize tempo (speed) and meter (periodicity) patterns and are unable to distinguish between pieces in the same tempo and meter. Inner Metric Analysis reveals not only the periodicity patterns in the music, but also the accent patterns peculiar to each musical genre. These accent patterns tend to correspond to perceptual groupings of the notes. We propose an algorithm that uses Inner Metric Analysis to map note onset information to an accent profile accent, profile that can then be compared to template profiles generated from rhythm patterns typical of each genre. The music is classified as being from the genre whose accent profile is most highly correlated with the sample profile. The method has a computational complexity of O(n2), where n is the length of the query excerpt. We report and analyze the results of the algorithm when applied to Latin American dance music and national anthems that are in the same meter (4/4) and have similar tempo ranges. We evaluate the efficacy of the algorithm when using two variants on the model for Inner Metric Analysis: the metric weight model and the spectral weight model. We find that the correct genre is either the top rank choice or a close second rank choice in almost 80% of the test pieces.

    A Model for Key-finding from Audio
    Ozgur Izmirli
    Connecticut College, Computer Science Department
    Center for Arts and Technology, New London, CT06320, USA
        Tonality is the system by which listeners organize all pitches around the most stable pitch, called the tonic, while listening to tonal music. The problem of key-finding involves the determination of the tonic and the mode of the musical scale. Key-finding has applications in many areas of music processing including automated music analysis and music information retrieval. In these contexts, extracting information from raw audio data has high practical value. We discuss a method for key-finding from polyphonic audio recordings. The aim is to locally determine the key and track its evolution in tonal space over the duration of the input. Templates are formed using diatonic collections of monophonic instrument sounds. Spectra of the instrument sounds are summed to obtain the templates. Each template corresponds to a specific tonic-mode pair and is represented by a distinct position in tonal space. Our model calculates cumulative spectra over segments of the audio input and finds the correlation between the spectra and the templates to determine the relative distances to each key. When the musical passage is in a single key or when there are short modulations, this method performs well. In the presence of modulations, however, the input is segmented by detecting the boundaries of the modulations before the method is applied.

    Applying the Spiral Array Key-finding Algorithm to Polyphonic Audio
    Ching-Hua Chuan & Elaine Chew
    University of Southern California Viterbi School of Engineering
    Los Angeles, CA 90089, USA
        Key is one of the most important features of tonal music, an indispensable factor in music analysis. We present a method to find the key in polyphonic music in audio format. Using Fast Fourier Transform, we map audio signals to pitch classes with weights corresponding to frequency strength. The Spiral Array, proposed by Chew in 2000, is a geometric model for tonality that represents tonal entities of different hierarchical levels in the same 3D space. We map the pitch classes derived from the audio to positions in the Spiral Array. The spatial representations of the pitch classes, weighted by their saliency, generate a center of effect in the interior of the structure. Key finding is achieved by a nearest neighbor search for the closest key representation in the Spiral Array space. This key-finding algorithm has been previously tested only on symbolic input; input derived from audio is necessarily more noisy than symbolic input. We report results of recent computational tests using audio files of the first movements of Mozartbabstracts symphonies and compare the Spiral Arraybabstracts key finding results to that using Krumhansl and Schmucklerbabstracts probe tone profile method.

SESSION 2 (TC02). Music, Computation and AI II -- Score Following, Accompaniment and Improvisation
Thursday, Jan 6, 2005, 1:30PM-3:00PM

    Graphical Modeling, Dynamic Programming, Polyphonic Score Matching, and Musical Accompaniment Systems
    Christopher Raphael
    Indiana University, School of Informatics
    Bloomington, IN 47408, USA
        I present a probabilistic graphical model for polyphonic score matching involving several layers of hidden variables. Two of these layers are modeled as a jointly Gaussian process accounting for time varying tempo and note-by-note variation (as in agogic accents). This process is related to a hidden "label" process that gives the corresponding score note for every frame of audio (the observable data). A variant of dynamic programming makes possible the near-global identification of the most likely configuration of unobserved variables. This search is over both discrete and parametrically represented continuous variables. I demonstrate an application to musical accompaniment systems in which a full orchestral accompaniment is generated on-the-fly to follow a live soloist.

    Modeling Score Structure for Improved Musical Score Following
    Bryan Pardo
    Northwestern University, Department of Computer Science
    Evanston, IL 60201, USA
        Automated musical accompaniment that reacts naturally to the human performer is a long-standing goal of a number of computer-music researchers. For many styles of music, this requires an agent able to follow a representation of a written score with similar facility to that of a human performer. Algorithms that align a score to a performance are called score followers.
        Standard score followers typically assume the score is adequately represented as a sequence of note onsets. The performance is assumed to be a best effort to play the score exactly as written, from start to end. Many scores contain optional sections (introductions, interludes) and sections which may repeat an un-predetermined number of times. It is not always desirable to force the performer to specify, a priori, how many repeats or optional sections will be played, in order to accommodate the score follower.
        This paper describes an approach to modeling score structure as a directed graph. The resulting score model lets a score follower track performances that take very different paths through the score, without losing its place. To test the improvement of this approach over the standard, I compare score following results between graph-based and sequence-based score followers on a melodic corpus of 98 Jazz melodies.

    Recurrent Neural Networks for Music Computation and Improvisation
    Judy Franklin
    Smith College, Computer Science Department
    Northampton, MA 01063, USA
        Some researchers in the computational sciences have considered music computation, including music reproduction and generation, as a dynamic system; that is, a feedback process. The key element is that the state of the musical system depends on a history of past states. Recurrent (neural) networks have been deployed as models for learning musical processes. We discuss recurrent networks that have been used for music learning. Over time, more intricate music has been learned as the state of the art in recurrent networks improves. We are currently using the Long Short-Term Memory (LSTM) recurrent network, with new representations for note pitch and duration. We will mention several pitch-oriented tasks that we have used to test these networks. We will also describe and play an example of a long song that two of these networks, one for pitch, and one for duration, have learned. We have continued experimentation with two networks, training them to learn four songs, as a function of the harmonic structure of the songs. We are now experimenting with the re-harmonization of the song structure, so as to generate new songs that may be regarded as improvisations. We will play examples of these generated songs.

Please post and distribute this announcement freely. Last update: December 31, 2004.