University of Southern California USC
Dr. Brian K. Shepard, Pedagogical Technology of Music

Return to Brian Shepard's page.

Teaching Music Through Advanced Network Videoconferencing
By
Brian K. Shepard

In the past couple of years, a number of major advancements in Internet technology--especially in conjunction with the consortium known as Internet2--have made the teaching of music through Internet videoconferencing not only a possibility, but a reality. The high-bandwidth capabilities of Internet2 finally allow musicians to fully participate in the benefits of the Internet Revolution. At the University of Oklahoma School of Music we are actively exploring this arena and creating applications to take advantage of these exciting new developments. We are now using videoconferencing to bring some of the world's finest teachers to our students and to allow our faculty to increase the range and effectiveness of their teaching. Along the way, we have made a number of discoveries and observations that we hope will be of benefit to others wishing to enter this exciting arena.

So that no one gets the wrong idea, let me make emphatically clear that we are in no way attempting to eliminate or replace live, in-person music teaching. Instead, we envision this technology as a supplement to the traditional music-teaching environment. All too often when one party is out of town, the only solution is to not have a lesson since the cost and time involved in bringing the parties together is prohibitive. In those cases, this technology can provide a bridge that allows the teaching and music-making process to continue.

Before going any further, I believe a little background information on Internet2 is in order. Internet2 is a consortium of more than 180 universities working in partnership with industry and government to develop and deploy advanced network applications and technologies to accelerate the creation of tomorrow's Internet. Internet2 is recreating the partnership among academia, industry and government that fostered today's Internet in its infancy. Part of that development is in the arena of high-quality videoconferencing. As anyone who has spent any time on the Internet knows, the quality of audio and video is still rather primitive. Audio streams are usually thin and weak sounding, not to mention monaural, and the video quality is even worse. The typical video file on the Internet opens in a small window on the computer monitor and features grainy and jerky motion. The incredible bandwidth of Internet2, however, allows for real-time, bi-directional, full-motion, broadcast-quality video on a television monitor with CD-quality, stereo audio. With that quality, comes the potential for finally using Internet videoconferencing for music applications.

In our initial explorations at the University of Oklahoma, we have identified a number of musical uses for this advanced network videoconferencing capability. Although we have had the most experience and success with the teaching of private lessons and master classes, other applications appear just as viable including rehearsal preparation, distribution of live and pre-recorded music and concerts, multi-venue recording projects, and shared research.

As we began our Internet2 project, we identified four major goals. First, we want to bring world-class artists and teachers to our students. Second, we want to extend our own professors' teaching around the country and around the world. Third, we want to expand the OU School of Music's outreach within the state of Oklahoma. And, finally, we want to increase the research and collaborative opportunities for our faculty.

A few concerns have also been identified as we continue to explore this technology. We strongly believe that technology should not be used just for the sake of using it. Technology must provide us with a viable and effective music making and teaching tool, and it should make our jobs better and/or easier.

As mentioned earlier, the "traditional" Internet--if there is such a thing--comes nowhere close to meeting the needs of those of us in the fine arts. The hardware and software are really designed for the delivery of static files like web pages and email messages. Music, on the other hand, requires the dynamic delivery of huge amounts of data. When you combine the data flow with the need for it to be truly interactive, the inadequacies of the current Internet become apparent. Therefore, in order to have effective musical collaborations and teaching experiences, a number of issues must be resolved and requirements met.

First, in order to teach and perform music, you must have high-quality, true-fidelity audio. In the Internet2 world, people often talk about digital video as the "killer app," that particular application that represents the ultimate usage of the Internet. While network engineers may be correct in their assertion, musicians usually find that audio is even more critical. Therefore, it is absolutely essential that the audio be of the highest quality. Both ends of the videoconference should have the clearest and most accurate representation of the sound from the other end. The audio should also be in stereo to resemble the live listening environment as closely as possible.

A particularly thorny issue related to videoconference audio is echo. Echo occurs when the sound created at one end of a videoconference comes out of the speakers at the other end, is picked up by the microphone(s) at that end and then returned to the originator with a slight delay. The echo can be quite disconcerting, since a person hears their own sound a fraction of a second later than when they created it. Many videoconferencing systems come with some sort of echo-cancellation capability. However, since videoconference systems are typically designed for talking, the echo-cancellation is tailored for the narrow frequency range of the human speaking voice. When applied to music, the result is a rather muted and lifeless sound, since one of the components of echo-cancellation is a dampening of upper frequencies where much of the echo occurs. Unfortunately, that is also the range where the brilliance and sparkle of most musical instruments occurs. The sensitive, high-quality microphones that are used to capture the true sound of musical instruments further compound the problem with echo, since these microphones also tend to capture the sound coming in on the speakers from the other end of the videoconference. Until a better quality echo-cancellation device is created, the only practical solution is to use highly directional microphones placed carefully to avoid picking up the audio signal from the speakers.

Although audio is critically important, you also need high-quality, full-motion video. The video image must be a clear and stable picture with good color accuracy to capture all the subtle movements involved in making music. The video needs to be completely synchronized with the audio--so called "lip synch"--so that the person watching has the sensation of seeing and hearing the other musician as they would normally watch and listen to them.

Another requirement for this technology to be useful in the music world is real-time interactivity. The participants at both ends should be able to fully interact with each other as though they were in the same room. In order for that to happen, both the video and audio need to be bi-directional with virtually no delay between ends of the videoconference. This area still has some room for improvement. Even under the best circumstances, you will typically experience around a tenth to a quarter of a second delay between ends with today's equipment. Ironically, the delays are not so much from the distance between units, but rather from the encoding/decoding process that converts the audio and video into Internet-ready packets. In doing tests, I've found that the travel time from Oklahoma to either coast is usually in the neighborhood of 30 to 40 milliseconds (0.03 to 0.04 seconds). This amount of delay is virtually imperceptible to our eyes and ears. In contrast, the encoding/decoding process usually adds delays of between 100 and 250 milliseconds (0.1 to 0.25 seconds). This amount of delay is enough to preclude playing duets and is slightly perceptible even when talking. When echo is present, the delay is even worse, since the sound travels to one end and then back, thus doubling the delay time.

Having equipment and software that is both easy to set up and to operate is another important issue. Teachers trying to work with students on musical elements don't need the additional burden of running a difficult computer program or complicated equipment. In an ideal world, the equipment will be such that a person can run it themselves without the need for additional operators, camerapersons, etc. Multiple operators not only increase the complexity of the videoconference, but the cost as well.

Cost effectiveness is also an important consideration with this technology. Although there are certainly substantial costs involved in purchasing the equipment and streamlining the network, those are usually one-time costs. If those costs are amortized over the number of times the equipment is used, the per-session costs are drastically reduced. When a faculty member's time is factored in, the cost effectiveness looks even better, since we typically spend many more hours in travel than in the actual meeting. For a true comparison, then, we need to look at the cost of each individual session compared to the costs of travel, time, lodging, meals, etc. that would be involved in a face-to-face meeting. It is also important to keep in mind that this equipment typically has multiple uses in addition to videoconferencing like video streaming and video encoding. Thus, for the same amount of money, you get a machine capable of several valuable tasks.

Finally, in order for all of this to work, we need collaborative partners around the country and around the world with whom we can work. Since the current generation of equipment is rather proprietary, partners need to have the same type of equipment at each end. Most of the manufacturers of Internet videoconferencing equipment with whom I have spoken have expressed their desire to make their systems compatible with other manufacturers' systems. So as that proprietary necessity diminishes, the field should open up for many more users. Then all we need are partners with similar interests that can lead to joint projects.

There are a number of Internet videoconferencing systems currently available. They range from the simple desktop camera systems available at most computer stores to the full-blown, dedicated videoconference systems with ungainly protocol names like H.320, H.323 and MPEG-2. Of all the available systems, though, the only one that provides the quality required by musicians is the MPEG-2 system. The current crop of MPEG-2 codecs (codec is an abbreviation for encoder/decoder) provides CD quality (44.1 kHz Sampling Rate, 16 Bit Sampling Width) and even DAT quality (48 kHz, 16 Bit) stereo audio along with broadcast quality video running at 30 frames per second. They also have fairly good audio/video synchronization ("lip-synch"), with encoding/decoding delays in the 100 to 250-millisecond range. MPEG-2 codecs typically connect to a 100Mb Ethernet port with a standard "Category 5" Ethernet cable, and their controller software is rather easy to operate. Despite a price of between $8,000 and $20,000, the number of units around the country is growing steadily.

MPEG-2 units are not without their weaknesses, though. Perhaps the biggest is the difficulty in initially setting up the hardware and software and optimizing the network for videoconferencing. As I mentioned earlier, MPEG-2 codecs are also rather proprietary, so one brand will usually not work with another. Finally, in an effort to minimize delays and make the videoconference as "real-time" as possible, there is virtually no buffering of the audio and video signal. Buffering is used in most one-way audio and video streams to allow a few seconds of the file to reach the destination and be stored in RAM memory before the file begins playing. Thus, any glitches that occur during the data transfer are not seen or heard. In a real-time situation, though, that amount of delay would be completely unacceptable. Therefore, without buffering, the video and audio are susceptible to any network delays and interference.

In spite of these weaknesses, the MPEG-2 codecs work quite well for teaching music, especially after making a few modifications. I have found it best to have two video monitors at each end--one to see the signal from the other end and the other so you can see yourself. This is especially important if you are using a stationary camera so you can ensure that when you demonstrate something you are actually in the camera's frame. As mentioned earlier, the echo cancellation system that often comes with these units doesn't work well for music and it is best to bypass it or remove it altogether. The units often come with a "boundary" type microphone that is designed to pick up all the voices in a conference room. Unfortunately, in a music setting, it picks up too much of the audio coming from the speakers, and the quality of the microphone is not ideal for musical applications. Therefore, you will want to replace the "boundary" microphone with high-quality, directional, recording studio microphones. Not only will they sound much better, they are also less likely to create echo by picking up audio from the speakers. When using multiple microphones, you will need to have an audio mixer since the MPEG-2 codecs typically only have one stereo audio input. Finally, you will want to have the audio coming from the other end playing through high-quality speakers, rather than the built-in speakers on the television monitor.

The possibility of finally being able to use the Internet for the teaching of music is incredibly exciting. Although the process is not perfect, the potential is phenomenal. As I mentioned at the beginning, we are not trying to eliminate or replace the live, in-person teaching of music, rather we are working to supplement and augment it by expanding our resources with this technology. I hope I have given you some ideas about using Advanced Network Videoconferencing to increase the scope of your musical offerings. I would be glad to visit with any of you further regarding these details. Please feel free to contact me at brian.shepard@usc.edu. You will also find more information by visiting the Internet2 website at www.internet2.edu. Finally, I encourage you to contact Ms. Ann Doyle, Manager of the Internet2 Arts & Humanities Initiative at adoyle@internet2.edu. As a musician herself, she truly understands the issues related to teaching music via Internet2 and I have found her to be a great help and source of information. She can also explain many of the intricacies of Internet2 and help put you in touch with other potential collaborators

Brian K. Shepard, DMA
Assistant Professor of Pedagogical Technology
Thornton School of Music
University of Southern California
840 West 34th Street, MUS 308
Los Angeles, CA 90089-0851
Phone: 213.821.4152
Fax: 213.740.3217
Office Location: LPB G103
Email: brian.shepard@usc.edu
www-rcf.usc.edu/~bkshepar/