Where We Are
Let’s get some basics out of the way. Until ATSC 3.0 is implemented, all television audio in the United States is delivered encoded as Dolby Digital. Inside Dolby Digital the carrier has a choice to deliver different services, among them two-channel stereo and full 5.1. All of the major broadcast networks, many large local TV channels, and quite a few cable channels, deliver 5.1 only.
That bears repeating. If you’re listening to CBS, NBC, ABC, or Fox, the product they distribute is a 5.1 mix. Even if you’re listening in stereo the signal leaving the network is 5.1. If you’re hearing it in mono the signal is still 5.1.
Wait a minute, I hear you say: how much content actually originates in 5.1? Well, nearly all episodic drama originates 5.1, and as a side note the mixes of these programs have never been better than they are today. Everything else, with few exceptions, originates as two-channel stereo. There are multiple reasons for this. The first is the post-production bottleneck—picture editors simply don’t want to deal with a six channel source no matter how easy designers of non-linear editing software make it. They are mostly used to stereo and mono sources (although they will still pass split-track recordings through as stereo with depressing regularity) and the time constraints in post can be crushing. For any show that visits an editing suite before airing, if there isn’t a wholly separate audio process, it can’t originate as 5.1. The only other way to deliver discrete 5.1 is to do it live, and there are so many traps to doing so that it is rare. Sports mixer Fred Aldous is a master at it and makes it look easy.
So why is everything coming to your house 5.1? It’s easier for the network. At network Master Control all incoming stereo material hits an upmixer to become 5.1 before being encoded as Dolby Digital. Anything originating discrete 5.1 simply bypasses the upmixer and goes straight to the encoder. This way the network doesn’t have to change anything in the encode process, and change is bad in failure-averse environments. Now there are upmixers and there are upmixers—the better ones, the ones sitting in national network equipment racks, do one particular thing superbly well: a stereo source going in will sound nearly identical to the metadata-derived stereo downmix coming out, despite the fact that it was turned into 5.1 and back to stereo along the way.
My Data Is Meta
Let’s talk metadata for a moment. When the ATSC standard was being formulated the idea was that mixers and producers would evaluate every show they made and specify the appropriate metadata parameters. As an example, to get a Dialnorm number the program would be run through a Dolby LM-100, the resulting number then reported to the carrier as the Dialnorm of the program, who would then set the DialNorm parameter appropriately. Or, if you were mixing a show where you purposely leaked what you had in the front L/R channels into the surrounds, you specified Surround Phase Shift as disabled to avoid the 90° shift Dolby baked into the standard.
No one ever did this.
Instead, the networks, needing to actually broadcast shows, set the metadata to fixed values (mostly the Dolby defaults) and required program producers to conform to those parameters. Hence the reason we all aim at a -24dB DialNorm value now, and you leak signal from the front L/R to the surrounds at your peril (since the Surround Phase Shift is permanently set to enabled).
Warning: Opinion of Author Will Appear Soon
And now I step into the area of opinion, and in this I am as the voice of one crying in the wilderness compared to all of the other variety and music mixers I know. I am diametrically opposed to the Recording Academy position on music in 5.1, but as there is little chance I will ever again be considered for employment on a Grammy broadcast I’m not terribly concerned about saying so. The Academy actually published a white paper some years ago covering music mixes, especially live television music mixes. Their position was that use of the center speaker should be severely limited if not done away with entirely. At least part of that was driven by the fear that if a soloist was out there naked, so to speak, in the center channel, that audio could be stripped off by a consumer with a grudge against the artist and released to the world revealing that the artist had slightly imperfect pitch, or strange breathing patterns, or some other flaw. To be fair this has happened, but I’m pretty certain no lasting damage has been done to anyone. It is, after all, just television, and performers do much stranger things than sing off-key all the time.The Academy position also revolves around a distrust that consumers can place and calibrate speakers in a surround system correctly. That center speaker will never be in the center, it will never be appropriate to the rest of the system, it will never be at the right level, etc. etc. But of course consumers always put stereo speakers in exactly the right placement related to the preferred listening position, don’t they?So why disregard the center speaker? Well, the record industry was successful for many decades putting the soloist in the “phantom” center, and by now everyone is used to it. Never mind the fact that the left/right speaker placement has to be exact to generate a phantom center or that you can literally lean your head out of the sweet spot and feel the soloist smear over toward whatever side you’re favoring. It’s what we’ve done in the past, so we’re going to continue doing it. Here’s a direct quote from the document:
most playback systems — even the most rudimentary consumer systems — allow each channel to be heard in isolation. Placing a lead vocal "naked" in the center channel, without other instrumentation to help mask poorly intonated notes, "auto-tuning" glitches, or bad drop-ins, can therefore potentially expose weaknesses in a performance and consequently incur the wrath of the recording artist and record label.For these reasons, most surround sound music mixers treat the center channel with caution, rarely if ever using it to carry any mix components exclusively. Instead, those instruments routed to the center channel (most often lead vocal, bass, snare drum, kick drum and/or instrument solos) are also generally routed to other speakers as well. Placing selected instruments in the center channel and one or both front speakers helps emphasize their sound within the front wall and also aids in localization if the listener moves around the room.
Recommendations For Surround Sound Production ©NARAS 2004
Personally, as a consumer it’s nobody’s business where I put my speakers, and the mix you make shouldn’t try to second-guess me. Just make the best mix you can and leave it at that.The men and women who mix episodic drama for television don’t seem to worry about any of that. By and large dialogue is locked in the center speaker without leakage into the L/R, and as long as that loudspeaker is somewhere near the screen, the viewer has no problem localizing the actors no matter where he or she is sitting in the room. That doesn’t work with a phantom center.The real reason I get all torches and pitchforks about this is that I’ve both heard and created discrete 5.1 music mixes that take my breath away precisely because the soloist is placed by themselves in the center speaker and not smeared into the L/R or the surrounds. Your brain loves this—it doesn’t have to do any complex work to reassemble and place the voice in the center of the sound field from information received from each side—the voice is right there where it’s supposed to be. A properly balanced 5.1 mix that uses the center speaker can make music come much more alive.
How Did We Get Here?
So what’s the reason 5.1 came about in the first place? Simple answer. Restricted bandwidth vs. convincing soundfield. Research by scientists in Europe revealed many years ago that the fewest channels of discrete sound you can get away with when trying to create a realistic soundfield is five: three across the front, and two on the sides slightly toward the rear. Tomlinson Holman came up with the .1 by pointing out that a bandwidth-limited low frequency effects channel could be included in a data stream with a very small data carriage penalty. And as usual audio got a very miserly slice of the bandwidth available. AC3 (the actual codec for Dolby Digital) is encoded at 384KBs. Your standard HDTV channel is about 18MBs.The place where things get really screwed up is when “music” people decide that the 5.1 discrete mix they are creating for a live broadcast must leave the center speaker silent. This starts a cascade of consequences. First, if the dialogue levels are to match the music levels the dialogue has to be steered out of the center as well. Think about it—you’re listening to the presenter or sports commentator in the center speaker. You’re used to it, it images from the center, it’s all good. Then the music comes on. Suddenly that speaker goes dark, and the singer or sax player or guitar solo you see that big closeup of isn’t coming from there. To compensate, the mixer turns up the soloist but it still doesn’t sound as good. Then, the artist’s manager demands he or she be turned down to make it sound more like the record. Then when the song is over the dialogue returns to the center, and all the 5.1 listeners say “what just happened?” Your only choice is to run the dialogue for the whole show as a phantom center to match the music, and the producers are going to have a cow if you do.None of this even addresses what happens to people listening in stereo. The default downmix parameter has the center channel feeding the L/R evenly at -3dB. Assuming the mixer has been studiously monitoring the downmix during the broadcast to make certain his center-channel material is sitting correctly, when he switches to music mode and the center channel goes dark anything that occupies the center of the mix now has to be pushed up 3dB to maintain the same loudness.Sad but true, at this point the mixer is actually better off delivering stereo to the network and letting the upmixer there handle things. The upmixer will derive a center from whatever algorithm it uses and the transition from music to speaking will be natural—the center speaker will stay on in all the 5.1 living rooms (and most of today’s upmixers treat music very well) and the downmix will be almost exactly what the mixer hears in the truck. And strange things happen when everyone isn’t on the same page. When Paul McCartney did the Super Bowl halftime in 2005 it sounded glorious in 5.1—Paul in the center, band in the L/R, crowd and fireworks in the surrounds. Right up until about midway through Live and Let Die. In the middle of an instrumental break Macca yells out “oh yeah” and does a little falsetto scream. It was planned, and the director cut back to him at exactly the right instant. Unfortunately, the mixer had Sir Paul’s live mic routed to the L/R instead of the center. After hearing the verse from the center speaker, suddenly we hear the yell from the sides. Which answered the question—was he singing live? Which was no. Which no one would have ever known if the mixer had just assigned the live mic to the center with the recorded vocal. I can only assume he was monitoring the downmix and never knew the mic was mis-assigned.
This Actually Works
Since you asked (even if you didn’t) here’s how I’ve structured the 5.1 music mixes that actually worked on-air. First you have to somehow get the carrier to agree to disable the Surround Phase Shift parameter on the Dolby Digital encoder (I actually got DirecTV to do this on several occasions). Second create an “Instruments/Reverb” mix, pan it hard L/R but pull it back 25% from the front. Next, create an “Audience/Effects” mix, pan it hard L/R and pull it back 75% from the front. Now create a mono “Soloist” mix and assign that to the center. Now switch to monitoring the downmix and create your balances. As long as everything is assigned correctly to the mixes created above, mixing in stereo will feel absolutely normal to you and when you occasionally dip into the 5.1 to check it you’ll be astounded by the depth, clarity, and cohesion.