25/01/2017 11:01 GMT | Updated 26/01/2018 05:12 GMT

'Alexa, What Are The Four Biggest Challenges To Voice Control Technology Hitting The Mainstream?'

The rise of 'voice-first' interfaces featured prominently in Gartner's list of 10 predictions about the strategic use of technology in 2017 -- the analysis firm anticipates that 10 million households will be using room-based screenless devices by the end of the year.

The rise of 'voice-first' interfaces featured prominently in Gartner's list of 10 predictions about the strategic use of technology in 2017 -- the analysis firm anticipates that 10 million households will be using room-based screenless devices by the end of the year. It's not hard to understand why when you see the enthusiastic reception that has greeted the launch of voice-controlled 'home hub' products by Amazon and Google.

Indeed, while there may not have been a single breakthrough product at the recently-concluded CES in Las Vegas, it was clear that - in an illustration of the shift forecast by Gartner in how people will interact with computers in the future - the show-stealer was Amazon's voice assistant Alexa, making voice control the technology trend to watch in 2017 for brands, advertisers, marketers, and consumers alike.

However, there are some substantial user experience design challenges that brands will have to overcome if this predicted boom in consumer adoption of voice control devices is to become reality.


The first challenge is to make the experience relevant to the user's context. Amazon Echo and Google Home are typically set up in communal areas in the home, and this has implications for the design of voice-based interactions for many tasks. For example, users may not feel comfortable vocalising sensitive personal information, such as passwords or bank account details. Brands providing personal or secure services, such as banks, need to ensure that customers are comfortable when verifying their identity using voice interfaces.

Context is also an important constraint when designing for voice control outside the home. In-car voice interactions, for instance, should impose only a low cognitive load on the driver. The driver's attention must be focused on the road for their safety and that of passengers and other road users.


A second design consideration is the complexity of the tasks being enabled. Voice control is only genuinely useful if it makes achieving a result quicker or easier than it would have been using established methods. Touch screens are second nature to consumers. They are useful too, displaying information in a dense but easy-to-digest format that the user can easily review as needed. Voice-based interaction requires users to retain much more information without reference to visual cues. To account for this, information should be delivered in manageable 'chunks'. Designers could also use voice to facilitate interactions, while using other devices - or even, in the longer term, projected holograms or augmented reality overlays - to simultaneously provide detailed information. There have even been rumours that Amazon itself may launch a second-generation Echo home hub with a touchscreen as well as voice control.

While information must be delivered in a manageable way, brands must take care not to simply replicate the step-by-step experience provided by a telephone operator or Interactive Voice Response (IVR) system. After all, the chief benefit of voice interaction is the ability to complete a complex process with a single command. There is an expectation on users here too: they need to understand that providing a single command describing an entire process will give quicker, more satisfying results than step-by-step instructions.

Learning curve

In research we have carried out for clients, this learning curve has proved steeper than might be expected. Consumers are conditioned by established interactions with apps to compartmentalise tasks and processes. Delivering the kind of direct, comprehensive instruction that yields the best result doesn't always come naturally. Think back to the early days of search engines, when users relied on trial and error to discover, or had to be explicitly taught, certain commands to make searches more efficient.

Both of these challenges can be overcome by focusing user experience design efforts on core tasks. Doing the basics well will help users scale that learning curve, increasing their confidence in voice control services as the process is clear and the results satisfying. Building this confidence is vital to keep them using the service and prevent voice becoming simply a novelty feature, used to impress people at dinner parties but not in everyday activity.


Equally important is setting clear expectations as to the capabilities of voice control services. While it is important to ensure responses are naturalistic -- blandly repetitive responses to similar queries or commands are a sure-fire way to snap users out of the illusion that they are talking to a real, intelligent thing -- users need to know where the boundaries are. A voice service that can engage in wide-ranging natural conversation may seem like the Holy Grail, but if you encourage users to test the limits of what can be understood or accomplished, you risk dissatisfaction and disillusionment when they eventually bump up against the edges of what it can do. Perhaps counterintuitively, simpler services that are clear about the tight focus of their capabilities can inspire much greater customer satisfaction because they 'just work'.

Voice control technology is undeniably gathering momentum: Alexa had 1,000 available 'skills' in June 2016, a figure that had risen to 7,000 by the start of CES in January 2017. Whether that momentum translates into widespread use of voice control devices by ordinary consumers - not just the tech-savvy early adopter crowd - and voice does indeed establish itself as the new paradigm for interacting with computers, will depend to a large extent on whether device manufacturers and brands can find ways to overcome or bypass these four challenges.