The UX of Voice: The Invisible Interface

Voice interaction represents the biggest UX challenge since the birth of the smartphone, so we break down the implications and opportunities for this paradigm shift in UX design.

It’s a brand new year, and by most reliable indicators – the latest demos at CES 2017, the buzz on all the tech blogs and even the pre-roll ads interrupting my binge watching of Crazy Ex-Girlfriend – it looks like 2017 will be the year that voice interaction reaches mainstream adoption.

Voice interaction – the ability to speak to your devices, and have them understand and act upon whatever you’re asking them – was everywhere this year. Device manufacturers of all shapes and sizes heavily integrated voice capabilities into their offerings at CES 2017, with Amazon’s Alexa stealing the show as their AI platform of choice.

Meet your new interface – for everything

The rapid proliferation of voice interaction capabilities in our individual digital ecosystems raises critical questions for any designer whose work plays a role in the customer experience. It’s becoming clear that voice interaction will soon become an expected offering as either an alternative, or even a full replacement to, traditional visual interfaces.

Voice is poised to impact UX design, just as mobile touchscreens turned web design on its head – except this shift is going to arrive way faster, and far from being limited to screen-based interactions, the transformation is going to permeate every aspect of our users’ lives. As consumers start to talk to and be understood by their products, user-centered companies must learn to apply the same intentional design principles to these interactions as they do with visual interfaces, if they hope to satisfy users’ high expectations for this new wave of tech to “just work”.

In this post, we’re going to explain some of the profound implications of the rise of voice interaction for UX design. Just as the internet began as a playground of raw new technical capability that embraced the guiding principles of intuitive, user-friendly product design over time, so too I see today’s voice-enabled tools and devices in their infancy, with limitless potential ready to be unlocked through innovative, user-centered design.

What’s driving adoption of voice interaction?

Before we dive into the specific implications of voice for our industry, it’s important to understand some of the forces that are propelling the rapid adoption of this new interaction medium.

MOORE’S LAW

Accurate natural language processing has, until very recently, existed only in the realm of science fiction, in part because it takes a lot of computing power to break down and interpret human speech in real-time. 2016 saw numerous significant breakthroughs in language processing, and we’ve reached a tipping point where there’s enough computational power available to us to make speech recognition and interaction a viable alternative to visual interfaces.

“…improvements in natural language processing have set the stage for a revolution in how we interact with tech: more and more, we’re bypassing screens altogether through the medium of voice… Shawn DuBravac of CTA said that 2017 would represent an inflection point in voice recognition as computers reach parity with humans, accurately transcribing speech about 94% of the time. “We’re ushering in an entirely new era of faceless computing,” DuBravac said.” ~CES 2017: Key trends, J. Walter Thompson Intelligence

In an age where almost a third of the global population is carrying a microphone connected to a supercomputer in their pocket, it’s not hard to guess at the huge swath of people that are primed and ready to adopt voice interaction as their input method of choice.

A VIABLE, CROSS-DEVICE VOICE PLATFORM

Getting the machines to understand us correctly is just one milestone in the quest for frictionless voice interaction, but another is making it available to users across multiple use-cases and contexts.

Just as the availability of internet access was one of the major growth factors driving more people online, so the adoption of voice interaction will be limited by the variety of scenarios in which we can simply speak to our devices and be understood. Alexa demonstrated its viability at CES 2017 as such a unifying platform, based on the sheer number of software and hardware developers who’ve chosen to hop onboard thus far, as well as a massive 9X jump in sales numbers of their Echo devices. It may not be the ultimate incarnation of the medium, but it’s currently a strong favorite to become the first voice-driven application to truly find a mainstream audience.

THIS ISN’T A NEW DIRECTION, JUST THE NEXT STEP IN UX DESIGN

As designers, we have to understand that humans have always been using intermediaries to interact with technology – from levers & pedals, to punch cards, to code, to GUIs, to touchscreens and now to voice and beyond. Each advancement of the way in which we use our tools was motivated by the need to reduce friction: to get more done, faster, easier and by more people.

Voice represents the new pinnacle of intuitive interfaces that democratize the use of technology – at least until direct brain-to-brain communication becomes a reality, (ahem, “Digital Telepathy”, anyone?).

So with this basic understanding of the driving forces behind voice interaction, what does this trend actually mean for designers of the customer experience?

The implications of voice for designers

Vocabulary: words matter more than ever

It was only recently that the movement among UX designers to ditch the use of placeholder text and lorem ipsum in visual interface designs started to gain traction. With the rise of voice interaction, now more than ever, our choice of words will influence how people perceive the customer experiences we design for them, because there are no accompanying visual cues to serve as a guide.

Designers for the voice context must realize they’re relying 100% on what the user perceives the chosen words and phrases to mean – a notoriously squishy concept!

Anyone remember this doozy of an argument about semantics?

Clearly, there’s an immediate need for some kind of standardized set of command phrases and keywords, so that users are able to intuitively navigate between different AI systems. It’s a safe bet that few will want to memorize proprietary sets of commands for each of their AI assistants.

As designers, we also must adapt and innovate to cope with some of the inherent limitations of this new medium. There are no images we can use to articulate processes more clearly. We can’t use animation to communicate complex concepts more easily. [inlinetweet prefix=”” tweeter=”” suffix=””]Telling the user to “Click Here” no longer has any meaning when applied to the invisible interface of voice[/inlinetweet], so we’ll need to develop a whole new lexicon of commonly-understood and intuitive cues for the user to act. Think about that for a sec: the most fundamental design element of the web, the clickable link, no longer has any place in the future standard of interface design.

UNDERSTANDING USERS’ INTENT

Consistent interpretation of commands between visual and voice interfaces will become a key concern for UX designers in charge of navigating this transition phase, particularly for web applications. Without the clear signal of a button click with which to interpret a user’s desired action, it will fall to the designers to anticipate their intent at each point in the conversation, and shape the appropriate response.

A hypothetical example: saying the phrase “Delete this” could be a valid command for voice-enabled versions of both a Microsoft Word document, and your Facebook profile settings – but the consequences and intent behind uttering the same words in each scenario are drastically different!

This will not always be such an easy distinction to make. Consider how visual and voice interfaces handle a common digital interaction – opting-in for an email newsletter. In a traditional visual interface, the typical email subscription process for the user goes something like this:

Simple, quick, and unambiguous, right?

Now, how might the same process be initiated by voice?

“Subscribe me to this blog.”

“Add my email address to their mailing list”

“Give me updates from this site”

“Opt me in for this blog’s email newsletter”

There are innumerable ways to articulate the same basic intent via voice – which means UX designers must make sure they’re asking the right questions to elicit the appropriate verbal responses from users.

Maintaining engagement with variability

Once the novelty of voice interaction wears off for the mainstream user, product designers will be challenged to maintain user engagement. As we saw in the email subscription example above, there are many ways to articulate even clear-cut binary choices when it comes to voice – but it’s this variability that offers opportunities for intuitive design to foster user engagement.

The nucleus accumbens is the part of human brain that lights up when we crave something, and in particular it’s highly stimulated by unpredictability. This means when we can’t predict what’s going to happen, we tend to pay a lot closer attention – which partly explains the addictiveness of gambling, and Netflix’s The OA, (seriously, try it, it’s an amazing show).

This neurological trait is already employed by many designers at the forefront of UX for visual design, and will likely continue to be leveraged as we start to shape conversations with our technology. Designing variability into these interactions opens the door for anthropomorphization and the user ascribing mood and even personality to the voices in the machines.

This wide variation in potential responses also places much more emphasis on the importance of crafting meaningful error messaging that steers the conversation with the user back on track, without being incessantly annoying. Users will quickly lose interest in conversing with a voice that robotically repeats “I’m sorry, I didn’t quite catch that”, like a broken phone tree menu.

Brands & personalities: An extension of voice?

Outside of what they’re actually saying, voices convey a wealth of meta-information to the listener – so it’s easy to imagine brands leveraging the medium of voice interaction as an extension of their personalities. Gender, age, inflection, tone, accent, cadence and pace are all elements that can be used by UX designers seeking to craft a particular customer experience with their brand.

Virgin America may opt to converse with you in a saucy, flirty and suggestive British voice that’s in line with their brand, whereas the New York Times might opt for a more mature, assertive, voice for their announcements. The kids may finally get to talk directly to Mickey as you book your Disney World vacation! Apple may be searching for the perfectly engaging, and yet soothing voice for your next operating system, (spoiler alert, it’s Scarlett Johanssen’s voice, in Her).

On the flipside, some brands may opt to let the user customize the voices they interact with – which leads to a looming philosophical debate: who actually controls a brand? Is it the company behind it? Or the customer’s perception of it?

It’s not hard to envision a new role for “VX Designers”, that’s like a cross between a casting director and a sound engineer – tweaking synthetic voices in search of that je ne sais quoi that they think will best engage their users.

Celebrities will likely find a brand new income stream from licensing not just the sound of their voices, but their entire personalities as AI assistants. Sound ridiculous? It does, but you can already pay about $10 to make your TomTom GPS nav unit speak like Snoop Dogg. (Oooo-wee!)

PREFERENCES, PRIORITIZATION & SMART SUMMARIZATION

One of the advantages visual interfaces retain over voice is the ability to present multiple options to users clearly and in a hierarchical manner – search results and pricing pages are perfect examples of this. But how exactly could you present a list of options to the user without an accompanying visual aid?

In this age of expected instant gratification, it’s hard to imagine the average user patiently listening to their AI assistant as it narrates a laundry list of all sushi restaurants within walking distance, one by one. This would be a classic case of the new medium being limited by the conventions of the past for the sake of familiarity – like someone printing out their emails before reading them: it kinda defeats the purpose, and absolutely doesn’t scale to accommodate today’s needs.

A more viable approach could be to prioritize and summarize the information based on known user preferences, prior to delivering an answer – in other words, doing what a normal person would naturally do in a conversation.

“Hey, Jason, where’s a good place to go for sushi?”

“There are several sushi restaurants in the area – would you like to walk, or drive?”

“It’s a nice day, I’m down to walk”

“Ok, Emperor Sushi is a 2 minute walk from here,
but if you want something cheaper, Ninja Sushi Deli is a 5-minute drive.”

“Good to know – let’s do Emperor Sushi today.”

In the case of our hypothetical quest for sushi, a more user-oriented voice interaction, asking relevant follow-up questions (“How far do you want to walk?”, “How much do you want to spend?”) to narrow down the list to the very best options before recommending them.

There are wide applications for these sorts of branching, dialog tree-driven interactions – hospitals, info kiosks and hotel concierges could all ditch the clunky touchscreens, and migrate to entirely conversation-driven interactions with voice-enabled devices in, say, your hotel room – with each response crafted by designers according to the latest findings and best practices in hospitality research.

It’ll be up to designers to identify this logical throughline for all kinds of requests, and craft the conversation with the user around it, so the machine is able to collect the data it needs to provide the best possible answer.

ACCESSIBILITY & PRIVACY

The shift to voice interaction must account for multiple accessibility considerations – for instance people who are deaf, mute, or sick and have temporarily lost their voices. Indeed, the amplified potential of voice has been recognized in these communities long before it gained traction in the mainstream:

“Able-bodied individuals gain convenience from voice-control technology, while the disability community gains the greatest reward of all: independence.” ~ Talk to the Machine: Voice Control Comes Into Its Own

If UX designers see large benefits in refining the design of their voice interfaces for the able-bodied, consider the huge impact they will have on the quality of life for those with advanced impairment of their motor function – it could literally mean the difference between life and death!

Without intentionally and thoughtfully designed interactions, the disabled will miss out on the benefits of the ease and intuitiveness of voice interactions. This could involve designers building accommodations into their experiences to run in a hybridized interface configuration that provides both audio and visual cues to these categories of users. Don’t expect chatbots to fade away.

Privacy concerns also abound in this new medium, and we’re walking a fine line between easing friction at the risk of opening up entirely new fronts of vulnerability. Most voice-driven devices currently store and automatically remember the necessary user credentials for the sake of reduced friction, but this will likely come back to bite us – as parents are starting to find out when hundreds of dollars’ worth of cookies unexpectedly arrive at the front door, and their 6 year-old starts looking guilty…

If our voices are our passports in this new medium, what’s to stop someone forging it by recording you speaking your password out loud, or editing your voice to synthesize commands that you never gave? These are imminent privacy concerns that UX designers must address to instill confidence in their users, and propel voice interaction further into the mainstream.

In summary: Pay attention and design with your eyes closed

Voice interaction is the next great leap forward in UX design, and we’ll see it proliferate rapidly in 2017, across software and hardware products.

Many of the old paradigms we’ve come to rely on with visual UX design don’t even apply to this new medium, so designers must step up to embrace and start refining this raw new technology in many aspects, by carefully and intentionally honing the vocabulary used in these interactions, and working to better understand our users’ intent at every step.

Once the novelty of voice interaction wears off, it’ll be incumbent upon UX designers to maintain their users’ engagement by leveraging personality and innovating in how AI assistants deliver answers to the questions they’re asked. It’ll begin with crafting the responses for clarity and understanding, and progress toward creating entire branded personas.

I’m under no illusions; what I’ve described here is huge design challenge. In fact, it’s the biggest design challenge we’ve faced since Steve Jobs’ legendary “One last thing” back in 2007, a product demo that, in hindsight, heralded a transformative change for web and software design.

Voice interaction may not have garnered the same fanfare just yet, but I believe the the same moment is upon us in the field of UX design, as voice interaction proliferates, augments and in many situations completely replaces visual UX design, as the new standard user interface. For decades, the limitations of our technology has forced us to design our interfaces within a 2-dimensional space – an unnatural expectation for 3-dimensional humans! Designing for voice could be the catalyst that helps us return to the original goal of UX: treating people like people again.