If a smoke alarm goes off in your home but no-one's there to hear it, did the alarm actually help prevent a fire? This may sound like a philosophical brain teaser but it's not. It's a simple question that every home owner should ask themselves.
The appeal of the smart home is that it can take care of its occupants and of itself, perceiving the needs of both and taking appropriate automated action.
Smart homes use cameras to "see" and sensors to "feel". Smart operating systems allow the home to "think" and make decisions. But being truly smart requires being able to listen - not just to voice commands, but also to other meaningful sounds in the home.
A truly smart home would be able to recognise the sound of an alarm and alert the home owner via their mobile device if they happened to be out. A truly smart home should also be able to recognise the sound of a window breaking, and in addition to alerting the absent homeowner, play loud music and turn on the lights to scare off any intruders.
With our intuitive understanding of the sounds around us and the stories they tell, it's easy to imagine computers are just as capable. So it comes as a surprise to many people that to a computer, recognising generic sounds - say a dog bark, a window breaking or a baby crying - is actually a lot more of a technical challenge than recognising speech or music.
In fact, the technology for sound recognition is very different to that required either for speech or music recognition. When analysing speech, the computer - and by computer I really mean software on a processing chip - has a relatively simple job. There are after all only so many sounds (or "phonemes") that the human voice can generate, and language models dictate the likely order of phonemes.
Once those phonemes and the language model has been mapped, identifying phonemes and converting them to text so the computer can recognise it is relatively straight forward. Speech recognition has been commercially available for over twenty years and we're continually seeing improvements to its accuracy and speed - as the release of Amazon's Echo demonstrates.
Recognising individual pieces of music is - and apologies to Shazam here - even simpler. Essentially, the computer takes an audio clip of the music and pattern matches it to a database of all songs. It's just like finger print analysis - and because digital music doesn't change, it's very accurate indeed.
But generic sounds come in an almost infinite variety of combinations. Unlike Apple's Siri or Amazon's Echo, there's no option of having a handy keyword to let the computer know the generic sound is about to begin. After all, a burglar doesn't shout "glass break!" before taking a hammer to your patio doors.
In reality, the computer must listen to a continual soup of any number of potential sounds and be able not only to recognise what a particular sound is, but also what it is not. That requires creating a "world model" that defines what are normal sounds in the smart home and benchmarking any analysis against it.
Training the software to recognise sounds requires advanced machine learning - learning that has to be based on the analysis of actual sounds. And yes, that means gathering sounds through rigorous testing; breaking thousands of windows of every size and type, sounding thousands of alarms, recording thousands of actual babies crying and actual dogs barking.
All these sounds are each individually and exhaustively profiled, using over five hundred different features and counting. The field of sound recognition is so new that many of these features have as yet no technical name - they are completely new aspects in the nature of sound that are being discovered through research and development in the emerging field of sound recognition.
Device manufacturers are now starting to realise the value of intelligent sound recognition within the home, embedding it in common smart home devices like cameras, light bulbs, smoke and carbon monoxide alarms - even home sound systems. Current sounds that can be recognised in the home are limited in number but they are each highly significant, telling a story that if understood by the smart home can improve the way we live.
As yet, unlike in the fields of speech and music recognition, there's no global taxonomy of sounds in existence. Creating such a taxonomy - one that encompasses all generic sounds - will no doubt produce a paradigm shift in the way humans and machines interact, giving computers new understanding of the unstructured physical world, creating billions of dollars in opportunity and opening up new ways of smarter living.
At Audio Analytic, we're the first company to have the vision to create this taxonomy of significant sounds. We're continually working on researching, identifying and mapping new sounds that tell stories within the Smart Home, stories that computers must understand and act on if they are going to help us live smarter lives.