Looking at the world of media: from music to RIA.

Why does conference call audio suck?

January 12th, 2006 Posted in General Media / Stuff

I had a wonderful conversation at my day job today (FS is not my full time job, yet. More on that later this month). One of the things we deal with at the office is routine conference calls. We have multiple offices around the world where we often have to call in remotely and this can often cause a lot of problems. Usually the people who call in have issues hearing the conversation and this caused Mike (one of my co-workers) to pose the question: “With all of today’s technology, why can’t we build a mic that mimics the human ear so that hearing people on a conference call is not a problem?”

As Mike pointed out, it’s only the people calling in that have the problem. If you are in the meeting room you have no problem hearing anyone speak, so if our ears can pick it out and our brains can decipher it why can’t a mic do this? After roundtabling this for a while discussing mic technology, active filtering by the brain, human ear construction and general overarching philosophy we settled on the fact that it is probably the delivery platform that is the major limitation, not the microphone. This is by no means a definitive or scientific explanation but we are pretty happy with it:

The problem with the phone is that we are taking a sound source that is usually heard by a stereo pair (i.e. ears), convert it to mono, compress the hell out of it for leveling and then down sample as an analog signal that is distributed via copper wire. What this boils down to is that phones pretty much butcher the sound source.

By doing all these horrible things to the audio we severely limit our brain’s ability to process it. First, we take away the stereo input. When sitting in a room with someone having a conversation we use our ability to place distance and position to process and then filter out the sounds we want to focus on. This means that during a conversation where multiple people are talking we can no longer use stereo processing to determine what is the proper placement of the sound and therefore make it much harder to decipher.

Next, the audio is compressed before sending it out. This is so that all the sounds stay in a relatively similar level over time. Louder sounds are made quieter and quieter sounds are made louder to keep it nice and even. There is a lot more to compression then this, but let’s use this simple generalization for now. The benefit of compression is that by losing some of the dynamics of the sound, you can even everything out and the conversation is usually easier to understand this way. The challenge of using compression is that multiple sounds sources get mixed in together and then evened out so that quieter sounds are now louder, louder sounds are quieter and separating them aurally is now a lot harder.

Finally, we take the mono, compressed source and then downsample it for transmission. Downsampling is the process of removing frequency data to make the audio single smaller and easier to transfer. When you downsample, this also causes details to be lost, once again making it harder to pull out the relevant information. By the time the audio reaches us it has been so hacked to death that even with the best equipment it can be nearly impossible to decipher what is going on in a busy meeting.

Anyway, it was a fun conversation and I had a great time with Mike, NJ and Rebecca trying to figure out what was the main cause of such horrible conference call experiences.

You must be logged in to post a comment.