Immersive Sound for Cinema 9/01/2014 5:00 AM Eastern Author: Larry Blake
The introduction in 2005 of the Digital Cinema Initiatives standard brought with it the largest wholesale change in motion picture presentation since the arrival of widescreen cinema and stereophonic sound in 1953. It differed greatly from the past because picture and sound specifications had already been carefully vetted by committees with an eye toward scalability of the DCPs (Digital Cinema Packages) that are sent to theaters. For the image, this meant 2k resolution was the minimum, but 4k was supported; in sound, all theaters were expected to have basic 5.1 systems, although the standard allowed for a total of 14 channels. Two additional channels are reserved for mono mixes for hearing impaired and visually impaired patrons, the latter being narration on top of the mix. However, it was inevitable that variations would soon occur, and these were first in picture with various implementations of 3-D. As soon as this was starting to sort itself out in 2012, two different immersive sound formats arrived to break the 7.1 barrier that was the limit for almost all previous DCPs. First, in January 2012 Auro Technologies, in association with Barco Cinema, introduced Auro-3D with the film Red Tails in Auro 11.1, which was shown in about 2 theaters in the U.S. The development of Auro-3D began seven years prior, with research that CEO Wilfried Van Baelen had done at his Galaxy Studios in Belgium. The Auro-3D cinema format, in its basic 11.1 cinema iteration, adds a 5.0 height layer—three screen speakers and two upper surround channels—above the standard 5.1 system—plus a top layer comprising a center-ceiling “Voice of God” channel. The system can be expanded to 13.1 with the splitting of the lower surrounds into four channels, as in 7.1. Utilizing their proprietary Auro Codec, the additional tracks are encoded in the four least significant bits of a standard 24-bit, 48kHz mix, so that only one 5.1 or 7.1 printmaster needs to be shipped on DCPs, with the additional height and top channels decoded in the cinema. Auro Technologies has a complete suite of plug-ins to aid mixers, including the Auro-Panner, to place sounds in the 3-D field, and Auro-Matic Pro, which allows upmixing of mono, stereo and 5.1 elements to their 11.1 and 13.1 formats. The second “salvo” in the new format wars occurred in June 2012 when Dolby Laboratories introduced its Atmos format for the Pixar animated film Brave on 14 screens. Dolby had been researching expanding cinema speakers for years, going back to 2002 and We Were Soldiers, which utilized an overhead VOG channel. After years of experimentation with various speaker positions, including screen height as in Auro-3D, Dolby arrived at standards for surround speaker spacing, locations, and dispersion and mounting angles. The side surround speakers begin near the screen, and fill up the first third of the auditorium where normal surrounds are absent. Timbre match of surrounds to screen channels is made a reality by employing bass management; this, combined with the placement of surrounds closer to the screen, helps smooth out the transition of sounds off the screen and by giving surrounds much increased power handling. Bass management is not used in all theaters; at Dolby’s screening room in Los Angeles and at the Samuel Goldwyn Theater at the Academy, the existing surrounds were able to go down to 40 Hz, which matches the specified low-end response of screen speakers. The final speakers added in Atmos are two overhead arrays down the length of the theater’s ceiling. Up to 64 speakers are supported by the CP-850 Atmos cinema processor, which went into production in April 2013; before that, theaters were using the studio RMU mastering unit.
OBJECT-BASED AUDIO Where Auro-3D is its current form is channel-based in the classic stereo film manner, with recorded tracks assigned either to specific speakers or arrays of speakers, Atmos is object-based. In object-based cinema audio, sounds are not necessarily dedicated to a specific channels for the length of program, but instead individual files are placed in the three-dimensional space of the theater via metadata containing level, location XYZ coordinates and start/stop times. (X is left-right across the screen, Y is from the screen to the back wall, and Z is height.) Object-based audio (OBA) is of course the foundation of video games, in which the timing and location of sounds are variable according to where players are in their worlds. For movies, which occur in a linear fashion, OBA is used for two purposes: One, to pinpoint the location of a sound in what otherwise might have been an array (such as a surround theater wall) or a group of speakers (such as behind the screen) or in three-dimensional variations among arrays and speakers. Two, it allows for this accurate panning to take place in various theater configurations and sizes: “halfway down the right side wall” scales to the same position, regardless of whether the wall contains eight or four speakers. Among the first public demonstrations of OBA for cinema were in the early part of the last decade by IOSONO, based on research done at the Fraunhofer Institute in Germany. IOSONO was shown in various venues in Los Angeles from 2008-2010, although current IOSONO efforts have primarily been in special venues and corporate events. As of summer 2014 the company is undergoing financial restructuring. While it is possible to mix Atmos exclusively utilizing objects, standard practice entails mixers using “beds,” which are essentially full-length “static object” tracks dedicated to specific channels. Thus, 7.1 beds for the dialog, music and effects stems at the final mix involve dedicating 24 of the 128 inputs to Dolby’s RMU. Sound effects and music beds are frequently expanded to 9.1, with the stereo overheads as two arrays. In this example, up to 104 object tracks can be recorded as mono .wav files containing XYZ coordinates and other object metadata. While most mixes may never need 128 simultaneous objects, object tracks (like beds) are dedicated to specific stems, to easily allow the creation of M&Es, not to mention facilitating archiving. When creating the MXF-wrapped file that is in effect the printmaster of Atmos mixes, only the actual audio used in the mix is used, with the silence between the events on all tracks—objects or beds—are deleted for space-saving purposes. During rendering in the theater, the objects are triggered in sync and placed to the proper locations according to the metadata.
The “scalability” of Dolby’s Atmos specifications apply not only to the surround and overhead arrays, but also to the screen, specifically the left-center and right-center speakers that have recently been largely absent from film mixes, save for certain Sony Dynamic Digital Sound (SDDS) mixes that used all channels in that format. (The configuration of course began in the Fifties with Cinerama, and later continued in Todd-AO.) Dolby has been strongly recommending Lc and Rc speakers where the screen is wider than 40 feet, and reports that a large percentage of Atmos installations have five screen channels. The presence of those speakers will make themselves known by smoothing out pans in wide screens common in today’s cinemas with stadium seating, especially to patrons seated close. While the smoothing of pans with standard three-screen speaker mixes, further benefit can be had when mixers create stems with five screen channels or create static objects, assigning elements to the narrow-width Lc and Rc speakers. This is generally regarded as very useful for dialog and effects panning, and for increasing the resolution of the primary screen “proscenium.” Again, just as three-speaker mixes spread out naturally to five, phantom images are created for Lc and Rc objects when there are no speakers present.
INDUSTRY STANDARDIZATION When digital sound first came to film exhibition in the early 1990s, three formats competed for the attention of filmmakers and theater owners: Dolby Digital, DTS and SDDS. Initially, distributors were divided into “camps,” so to hear any film in digital stereo, exhibitors had to install all three formats, something that was not practical or cost-efficient. By about 1995, theater owners were given a “pass,” and studios started to release “quad track” 35mm prints containing all three digital formats, plus a stereo optical analog track. Within a few years, most major studios releases were done this way, although many films continued to be released only with Dolby Digital, which eventually became the most popular format, both in filmmaker and exhibitor acceptance. As noted earlier, digital cinema was initially a proverbial Switzerland of film sound, and this situation changed with Atmos, whose proprietary immersive format requires that a separate 5.1 or 7.1 PCM mix be included on DCPs. Auro-encoded printmasters, on the other hand, can play in any theater, and this summer for The Amazing Spider-Man 2, the 5.1 PCM mix was 11.1 Auro-3D encoded. When creating a 5.1 mix, the Auro Encoder adjusts the levels of the height and top layers, and these adjustments are “undone” when played back in Auro theaters. However, the fact still remains that Auro-3D and Atmos, as the first two immersive sound formats in widespread use, are completely incompatible in philosophy, implementation and speaker layout. The situation is worse than had been the case with 35mm digital formats, and theater owners have the most to lose by investing in one system that is unable to play the competition’s track. In early 2013 the technology committees of the U.S. exhibition trade organization NATO (National Association of Theater Owners) and the European trade group UNIC (Union Internationale des Cinémas), along with DCI, joined forces in an effort to see to it that a “common immersive sound package” be utilized, as opposed to the 35mm quad-track solution of delivering all formats to all theaters. Answering the call, SMPTE formed a special Working Group (TC-25CSS) to assist in this standardization effort. Auro Technologies and Dolby have both pledged to adapt to the agreed-upon open format. While Auro-3D is not currently object-based, their creative tools suite allows object-based mixes to be made, although it will not be in the same 5.1 or 7.1 PCM format as today. Also, the Barco cinema processors were designed with an upgrade path in mind, and 24 outputs, which presumably would allow the surrounds to be split into more zones. Essentially, the goal will be for the metadata of any format’s mix to be seen by any cinema processor’s renderer, which is matched to the configuration file of a theater’s specific speaker layout. Indeed, back at the mix stage, there have been mixes that were originally made in Auro or Atmos that have had panning data modified for the other format. The difference to the public would be how much the theater’s system matches that of the mix stage. One potential solution that has been presented is Multi-Dimensional Audio by DTS. The company, which was originally known for its double-system digital theatrical format, split in two in 2009, with DTS keeping the licensing of consumer software and codecs. (The theatrical business was spun off to a new company, Datasat, which coincidentally manufactures the AP24 processors for Barco on an OEM basis.) The intellectual property of MDA originally began at SRS Laboratories, before it was acquired by DTS. The “MDA Cinema Proponents Group,” an informal alliance comprised of DTS and companies such as QSC, USL, Barco and Auro Technologies, has gathered to present MDA to TC-25CSS. While MDA has not been used on any films, it has been tested in the industry, and version 1.0 of the code was released in early August, following up on specifications submitted to 25CSS months earlier. The SMPTE standardization process is famously long and drawn-out, and while the industry wants the format war to end, there’s no reason that filmmakers or equipment manufacturers need to wait for any decision to be made to use MDA in the real world. (After all, Auro Technologies and Dolby didn’t need to wait!) Object-based like Atmos, MDA is being offered to the industry as an open format, with a SDK available to developers. As an open format, MDA would be license-free, and DTS would make available necessary software for digital audio workstations and console manufacturers. (Auro Technologies and Dolby have been providing similar support to filmmakers.) Unlike Auro and Atmos, whose basic philosophies demand specific, scalable speaker locations and aiming (with Dolby going a step further in components and EQ), MDA is, by design, speaker agnostic. Indeed, there will be presumably much leeway in its implementation in theaters. For example, USL has come up with a cost-effective way for cinemas to upgrade by rendering the MDA mix to 13.1 channels of PCM files “offline,” distributing those files to the media blocks of the servers in theaters. Rendering would take into account configuration files for individual cinemas. Once an open object-based format is agreed upon, the next goal for theatrical presentation could be for the immersive file to downmix to 5.1 or 7.1 in theaters, avoiding the need to have a separate channel-based PCM mix on the DCPs. However, because this would mean that the downmixes are done without filmmaker intervention and control, it remains unclear if this is even a practical goal. Items like screen-to-surround panning would make downmix errors especially apparent.
The introduction in 2005 of the Digital Cinema Initiatives standard brought with it the largest wholesale change in motion picture presentation since the arrival of widescreen cinema and stereophonic sound in 1953. It differed greatly from the past because picture and sound specifications had already been carefully vetted by committees with an eye toward scalability of the DCPs (Digital Cinema Packages) that are sent to theaters. For the image, this meant 2k resolution was the minimum, but 4k was supported; in sound, all theaters were expected to have basic 5.1 systems, although the standard allowed for a total of 14 channels. Two additional channels are reserved for mono mixes for hearing impaired and visually impaired patrons, the latter being narration on top of the mix. However, it was inevitable that variations would soon occur, and these were first in picture with various implementations of 3-D. As soon as this was starting to sort itself out in 2012, two different immersive sound formats arrived to break the 7.1 barrier that was the limit for almost all previous DCPs. First, in January 2012 Auro Technologies, in association with Barco Cinema, introduced Auro-3D with the film Red Tails in Auro 11.1, which was shown in about 2 theaters in the U.S. The development of Auro-3D began seven years prior, with research that CEO Wilfried Van Baelen had done at his Galaxy Studios in Belgium. The Auro-3D cinema format, in its basic 11.1 cinema iteration, adds a 5.0 height layer—three screen speakers and two upper surround channels—above the standard 5.1 system—plus a top layer comprising a center-ceiling “Voice of God” channel. The system can be expanded to 13.1 with the splitting of the lower surrounds into four channels, as in 7.1. Utilizing their proprietary Auro Codec, the additional tracks are encoded in the four least significant bits of a standard 24-bit, 48kHz mix, so that only one 5.1 or 7.1 printmaster needs to be shipped on DCPs, with the additional height and top channels decoded in the cinema. Auro Technologies has a complete suite of plug-ins to aid mixers, including the Auro-Panner, to place sounds in the 3-D field, and Auro-Matic Pro, which allows upmixing of mono, stereo and 5.1 elements to their 11.1 and 13.1 formats. The second “salvo” in the new format wars occurred in June 2012 when Dolby Laboratories introduced its Atmos format for the Pixar animated film Brave on 14 screens. Dolby had been researching expanding cinema speakers for years, going back to 2002 and We Were Soldiers, which utilized an overhead VOG channel. After years of experimentation with various speaker positions, including screen height as in Auro-3D, Dolby arrived at standards for surround speaker spacing, locations, and dispersion and mounting angles. The side surround speakers begin near the screen, and fill up the first third of the auditorium where normal surrounds are absent. Timbre match of surrounds to screen channels is made a reality by employing bass management; this, combined with the placement of surrounds closer to the screen, helps smooth out the transition of sounds off the screen and by giving surrounds much increased power handling. Bass management is not used in all theaters; at Dolby’s screening room in Los Angeles and at the Samuel Goldwyn Theater at the Academy, the existing surrounds were able to go down to 40 Hz, which matches the specified low-end response of screen speakers. The final speakers added in Atmos are two overhead arrays down the length of the theater’s ceiling. Up to 64 speakers are supported by the CP-850 Atmos cinema processor, which went into production in April 2013; before that, theaters were using the studio RMU mastering unit.
OBJECT-BASED AUDIO Where Auro-3D is its current form is channel-based in the classic stereo film manner, with recorded tracks assigned either to specific speakers or arrays of speakers, Atmos is object-based. In object-based cinema audio, sounds are not necessarily dedicated to a specific channels for the length of program, but instead individual files are placed in the three-dimensional space of the theater via metadata containing level, location XYZ coordinates and start/stop times. (X is left-right across the screen, Y is from the screen to the back wall, and Z is height.) Object-based audio (OBA) is of course the foundation of video games, in which the timing and location of sounds are variable according to where players are in their worlds. For movies, which occur in a linear fashion, OBA is used for two purposes: One, to pinpoint the location of a sound in what otherwise might have been an array (such as a surround theater wall) or a group of speakers (such as behind the screen) or in three-dimensional variations among arrays and speakers. Two, it allows for this accurate panning to take place in various theater configurations and sizes: “halfway down the right side wall” scales to the same position, regardless of whether the wall contains eight or four speakers. Among the first public demonstrations of OBA for cinema were in the early part of the last decade by IOSONO, based on research done at the Fraunhofer Institute in Germany. IOSONO was shown in various venues in Los Angeles from 2008-2010, although current IOSONO efforts have primarily been in special venues and corporate events. As of summer 2014 the company is undergoing financial restructuring. While it is possible to mix Atmos exclusively utilizing objects, standard practice entails mixers using “beds,” which are essentially full-length “static object” tracks dedicated to specific channels. Thus, 7.1 beds for the dialog, music and effects stems at the final mix involve dedicating 24 of the 128 inputs to Dolby’s RMU. Sound effects and music beds are frequently expanded to 9.1, with the stereo overheads as two arrays. In this example, up to 104 object tracks can be recorded as mono .wav files containing XYZ coordinates and other object metadata. While most mixes may never need 128 simultaneous objects, object tracks (like beds) are dedicated to specific stems, to easily allow the creation of M&Es, not to mention facilitating archiving. When creating the MXF-wrapped file that is in effect the printmaster of Atmos mixes, only the actual audio used in the mix is used, with the silence between the events on all tracks—objects or beds—are deleted for space-saving purposes. During rendering in the theater, the objects are triggered in sync and placed to the proper locations according to the metadata.
The “scalability” of Dolby’s Atmos specifications apply not only to the surround and overhead arrays, but also to the screen, specifically the left-center and right-center speakers that have recently been largely absent from film mixes, save for certain Sony Dynamic Digital Sound (SDDS) mixes that used all channels in that format. (The configuration of course began in the Fifties with Cinerama, and later continued in Todd-AO.) Dolby has been strongly recommending Lc and Rc speakers where the screen is wider than 40 feet, and reports that a large percentage of Atmos installations have five screen channels. The presence of those speakers will make themselves known by smoothing out pans in wide screens common in today’s cinemas with stadium seating, especially to patrons seated close. While the smoothing of pans with standard three-screen speaker mixes, further benefit can be had when mixers create stems with five screen channels or create static objects, assigning elements to the narrow-width Lc and Rc speakers. This is generally regarded as very useful for dialog and effects panning, and for increasing the resolution of the primary screen “proscenium.” Again, just as three-speaker mixes spread out naturally to five, phantom images are created for Lc and Rc objects when there are no speakers present.
INDUSTRY STANDARDIZATION When digital sound first came to film exhibition in the early 1990s, three formats competed for the attention of filmmakers and theater owners: Dolby Digital, DTS and SDDS. Initially, distributors were divided into “camps,” so to hear any film in digital stereo, exhibitors had to install all three formats, something that was not practical or cost-efficient. By about 1995, theater owners were given a “pass,” and studios started to release “quad track” 35mm prints containing all three digital formats, plus a stereo optical analog track. Within a few years, most major studios releases were done this way, although many films continued to be released only with Dolby Digital, which eventually became the most popular format, both in filmmaker and exhibitor acceptance. As noted earlier, digital cinema was initially a proverbial Switzerland of film sound, and this situation changed with Atmos, whose proprietary immersive format requires that a separate 5.1 or 7.1 PCM mix be included on DCPs. Auro-encoded printmasters, on the other hand, can play in any theater, and this summer for The Amazing Spider-Man 2, the 5.1 PCM mix was 11.1 Auro-3D encoded. When creating a 5.1 mix, the Auro Encoder adjusts the levels of the height and top layers, and these adjustments are “undone” when played back in Auro theaters. However, the fact still remains that Auro-3D and Atmos, as the first two immersive sound formats in widespread use, are completely incompatible in philosophy, implementation and speaker layout. The situation is worse than had been the case with 35mm digital formats, and theater owners have the most to lose by investing in one system that is unable to play the competition’s track. In early 2013 the technology committees of the U.S. exhibition trade organization NATO (National Association of Theater Owners) and the European trade group UNIC (Union Internationale des Cinémas), along with DCI, joined forces in an effort to see to it that a “common immersive sound package” be utilized, as opposed to the 35mm quad-track solution of delivering all formats to all theaters. Answering the call, SMPTE formed a special Working Group (TC-25CSS) to assist in this standardization effort. Auro Technologies and Dolby have both pledged to adapt to the agreed-upon open format. While Auro-3D is not currently object-based, their creative tools suite allows object-based mixes to be made, although it will not be in the same 5.1 or 7.1 PCM format as today. Also, the Barco cinema processors were designed with an upgrade path in mind, and 24 outputs, which presumably would allow the surrounds to be split into more zones. Essentially, the goal will be for the metadata of any format’s mix to be seen by any cinema processor’s renderer, which is matched to the configuration file of a theater’s specific speaker layout. Indeed, back at the mix stage, there have been mixes that were originally made in Auro or Atmos that have had panning data modified for the other format. The difference to the public would be how much the theater’s system matches that of the mix stage. One potential solution that has been presented is Multi-Dimensional Audio by DTS. The company, which was originally known for its double-system digital theatrical format, split in two in 2009, with DTS keeping the licensing of consumer software and codecs. (The theatrical business was spun off to a new company, Datasat, which coincidentally manufactures the AP24 processors for Barco on an OEM basis.) The intellectual property of MDA originally began at SRS Laboratories, before it was acquired by DTS. The “MDA Cinema Proponents Group,” an informal alliance comprised of DTS and companies such as QSC, USL, Barco and Auro Technologies, has gathered to present MDA to TC-25CSS. While MDA has not been used on any films, it has been tested in the industry, and version 1.0 of the code was released in early August, following up on specifications submitted to 25CSS months earlier. The SMPTE standardization process is famously long and drawn-out, and while the industry wants the format war to end, there’s no reason that filmmakers or equipment manufacturers need to wait for any decision to be made to use MDA in the real world. (After all, Auro Technologies and Dolby didn’t need to wait!) Object-based like Atmos, MDA is being offered to the industry as an open format, with a SDK available to developers. As an open format, MDA would be license-free, and DTS would make available necessary software for digital audio workstations and console manufacturers. (Auro Technologies and Dolby have been providing similar support to filmmakers.) Unlike Auro and Atmos, whose basic philosophies demand specific, scalable speaker locations and aiming (with Dolby going a step further in components and EQ), MDA is, by design, speaker agnostic. Indeed, there will be presumably much leeway in its implementation in theaters. For example, USL has come up with a cost-effective way for cinemas to upgrade by rendering the MDA mix to 13.1 channels of PCM files “offline,” distributing those files to the media blocks of the servers in theaters. Rendering would take into account configuration files for individual cinemas. Once an open object-based format is agreed upon, the next goal for theatrical presentation could be for the immersive file to downmix to 5.1 or 7.1 in theaters, avoiding the need to have a separate channel-based PCM mix on the DCPs. However, because this would mean that the downmixes are done without filmmaker intervention and control, it remains unclear if this is even a practical goal. Items like screen-to-surround panning would make downmix errors especially apparent.
Comment