I’m always exploring concepts surrounding User Interface development, and a video I recently came across got me thinking about sound and how it relates to interactive experiences. Audible cues are the bastard child of UI design, to be used sparingly or not at all. There is a rather low tolerance for sound cues in desktop applications, and almost none for anything delivered in a web browser. Yet we all know that when done right, sounds can deeply affect our experience. I’ll return to that in a second, but first the clip:
This is a video demonstrating the McGurk effect. The McGurk effect is a demonstration of a deep interconnection between our auditory and visual processing systems. Watch the video and listen to the sounds being uttered. Then close your eyes and play the video again.
I think this effect is quite profound and to me serves as a reminder of a resource we may be overlooking too much when it comes to designing effective software interfaces. There is only one type of software that I regularly interact with that uses sound heavily for interface cues, and that is video games. The most obvious usage is to convey positioning in space. In a first-person shooter, you rarely get visual cues of a monster creeping up behind you, but you get plenty of auditory ones. In real-time strategy games there is often a concept of a radar showing activity across the game map, but it is usually the sounds of battle in the distance that alert you to something going on first.
In a game I’ve been playing recently, Company of Heroes, text alerts are displayed on the screen when a unit comes under attack, but they are easy to ignore. What isn’t easy to ignore is a tank commander screaming "holy shit, they’ve got Panzers!" which immediately gives me a sense of three things: who is under attack (voices for different types of units are very distinguishable), roughly where they are (whether the audio is off to my left or right and how loud it is) and even what kind of unit is attacking (a German Panzer tank). In older RTS games, units would often acknowledge a command by saying something like "understood" or "moving out", but in CoH they will say things like "quick, get on that turret" which lets me know that not only did they get the move order but that I indeed properly selected a turret to be captured.
Time and state can also be richly enhanced by auditory cues. Consider the classic Super Mario Brothers. There is always a clock ticking, but no one pays attention until the music speeds up indicating you are almost out of time (and actually makes you feel like you need to move faster). Or consider when Mario grabs an invincibility star. Even though there is a visual cue (a blinking Mario), it is the change to "invincibility music" that we are more aware of, including the cure for when the invincibility is about to run out. If you don’t think so, try playing without the sound on.
Okay so how does this relate to non-gaming software? In my mind right now there is only one common usage of audio cues for indicating what is going on in an application, and that is alerts. We often use little chimes to indicate things like an error occurring or some incorrect key being pressed. I think a more useful example is that of the dings used to notify you of a response in an instant messaging client. Many people may turn those off if they get too annoying. Gmail chat has a great adaptation where no audio cues are used if the chat is my currently active window, since I can see what is going on, but they are used if I have switched away to some other window. But beyond alerts, I am hard pressed to come up with really compelling examples of sound enhancing the functionality and, more importantly, usability of an application.
One area to explore is in conveying the general "ambiance" of an application. When I was working at 9mmedia we were approached by a client who was trying to create a more engaging interface for a network monitoring application. This app would have people sitting watching it all day for emergencies, and they wanted some way to make it more interesting so people wouldn’t fall asleep at the wheel. The end result was an app that looked like a radar station in a submarine, complete with a sweeping band showing the status of servers as blips. To enhance the effect, we used sound during the log in process, where upon entering correct credentials, the login box would seal away behind a door and two metal plates would unlock and open to reveal the control panel – all using lots of heavy clanking and whirring noises. The end result was an effect that made you feel more like the captain of a nuclear submarine than a low-level sys admin.
Is that example applicable to all software? Certainly not. But at the same time if I asked you to think of how you could make a real-time network monitoring app more compelling, is that the kind of thing you would come up with? More often that not the answer will be no, and that might be a missed opportunity. Here’s another example I am just coming up with on the fly: imagine you are searching for a good bar in your neighborhood. You might search on Google Maps and then click around to read some reviews. What if the review information was instead converted to an auditory cue such that as I panned around the map (perhaps even when walking down the street using street view) I heard "bar noise" that was louder at more popular locations, just like the real world. I could easily pan around searching for auditory "hot spots" and perhaps avoid the time consuming process of clicking around looking for reviews.
I’m trying to think about ways in which even more basic elements of software UI could be significantly enhanced via sound. For example, are there auditory cues which would actually enhance basic functionality like cut/paste, zoom in/out, scroll, window managment, drag-and-drop, etc, or are they all doomed to be gimmicky?