04/02/2013 11:01 GMT | Updated 03/04/2013 06:12 BST

Under the Radar: Voice Control and the Science of the Unobtrusive

The holy grail of user-experience is to design something that is inherently unobtrusive, without needing a learning period, but very few new pieces of technology achieve that.

If you began reading this article, and you noticed the piece of technology that you were using to do so, then the chances are that it's a new toy; something you've recently acquired. If you've had it for a while, and using it isn't second nature, then there's almost certainly a major design flaw involved somewhere. We use our tablets, laptops and smartphones everyday; we are familiar with how they work and their interfaces - the means by which we control their functions are totally unobtrusive. For most of us, typing and using onscreen interactivity is as natural as speaking.

The holy grail of user-experience is to design something that is inherently unobtrusive, without needing a learning period, but very few new pieces of technology achieve that. Most interfaces take time to become familiar; if that period is too lengthy then we lose patience, and the technology is discarded. Whether we're prepared to take time to become familiar with it depends on the potential reward. It takes a long time to learn to touch-type, for instance - but the benefits in a digitally-orientated age are huge.

Drilling down another layer into the realm of the operating system and the software application, throws up a whole other raft of problems, and some that are not so easily overcome. For example, stroking a touchscreen to scroll down or across is not just intuitive - with no prior knowledge, that motion is amongst the first that most of us would try to achieve the desired effect. However, that universality breaks down with more complex operations. When you turned on your first ever smartphone, you probably knew how to scroll down pages immediately, but personalising your ring tones took a little more time.

Again, a process of familiarisation is at work - if the operation requires too complex a method (if it cannot be remembered after one or two tries) then the operating system will frustrate. That's unlikely to happen with a phone, but try something more complex, like Photoshop. That application is hard to learn to use through trial and error and requires most people a measure of tuition before they can even begin to use it effectively. Take something even more complex, like writing HTML5 code, and trial and error isn't even an option. These are examples of software where the user interface is initially obtrusive - you notice the interface more than the application, in some cases so much that the application is useless to you.

The reason for this is that people visualise and verbalise the same things in very different ways. When looking for the 'select' icon in Photoshop, not everyone would try to find the same thing. What's more, not everyone would say the same thing, were they to say it. A human being is able to construe the same meaning from many different words and phrases, and translate a request into the desired operation (hence, the tuition). In the last few years, artificial intelligence and voice recognition have improved to the point where computers can be almost as effective as human beings in this capacity. This means that a voice-enabled interface can be the most unobtrusive of all, offering something similar to an actual conversation, even when the user is using their natural forms of language. Siri is the most mainstream example of this, with 'show me the news', 'what's going on in the world?' and 'what are the headlines today?' all performing the same task, along with many other variations on that theme.

These are tasks that one could easily learn to perform manually, but it doesn't take a great stretch of the imagination to think that one day, you could tell Photoshop what you wanted doing with a given picture, and it would respond to your commands and corrections. The availability of high-powered, cloud connected portable devices has made the delivery of conversational applications possible. Siri's unobtrusive interface is merely convenient, but unobtrusive interfaces on complex applications potentially offer real economic benefits. The learning curve on some applications could be completed considerably faster, turning hours of training time into production time, and allowing more people than before to access powerful technology of many kinds.

The benefits of the ultimate non-obtrusive interface are not even confined to software. Anyone who has ever bought a high-end car is familiar with the struggle to interpret a myriad of buttons, each with its own (often mysterious) icon. Being able to simply state your needs verbally would save an awful lot of leafing through owners' manuals, as well as save a great deal of driver distraction when frantically searching for a certain button while navigating a country road. BMW and Audi are both introducing technology to assist drivers with exactly this problem. Voice can even be used to slay the long-time bogeyman of modern life - the call centre queue. Advances in natural language understanding have transformed voice-enabled applications from simple voice to text transcription to applications that can understand the intent of the user's request and even answer complex questions in response. Being able to call a contact centre (or open an app on your smartphone or web browser), state your requirement in natural language and, your identity having been confirmed automatically through voice biometrics, have your task carried out by automated technology, is in fact now a reality.

Ultimately, the best technology makes our lives easier without us noticing we are using it. As humans, speaking is one of the most natural means of interaction we have, and the technology is now available to allow us to control it with our voices. Many of us can't go a single day without having a meaningful conversation with some form of technology that is powered by a language solution. As this continues the more we are able to do that, the more that the interface - so often the barrier between us and the benefits of powerful technology - will fade into the background.