In the last couple of years, magic started happening in AI. Techniques started working, or started working much better, and new techniques have appeared, especially around machine learning ('ML'), and when those were applied to some long-standing and important use cases we started getting dramatically better results. For example, the error rates for image recognition, speech recognition and natural language processing have collapsed to close to human rates, at least on some measurements.
So you can say to your phone: 'show me pictures of my dog at the beach' and a speech recognition system turns the audio into text, natural language processing takes the text, works out that this is a photo query and hands it off to your photo app, and your photo app, which has used ML systems to tag your photos with ‘dog’ and 'beach’, runs a database query and shows you the tagged images. Magic.
There are really two things going on here - you’re using voice to fill in a dialogue box for a query, and that dialogue box can run queries that might not have been possible before. Both of these are enabled by machine learning, but they’re built quite separately, and indeed the most interesting part is not the voice but the query. In fact, the important structural change behind being able to ask for ‘Pictures with dogs at the beach’ is not that the computer can find it but that the computer has worked out, itself, how to find it. You give it a million pictures labelled ‘this has a dog in it’ and a million labelled ‘this doesn’t have a dog’ and it works out how to work out what a dog looks like. Now, try that with ‘customers in this data set who were about to churn’, or ‘this network had a security breach’, or ‘stories that people read and shared a lot’. Then try it without labels ('unsupervised' rather than 'supervised' learning).
Today you would spend hours or weeks in data analysis tools looking for the right criteria to find these, and you’d need people doing that work - sorting and resorting that Excel table and eyeballing for the weird result, metaphorically speaking, but with a million rows and a thousand columns. Machine learning offers the promise that a lot of very large and very boring analyses of data can be automated - not just running the search, but working out what the search should be to find the result you want.
That is, the eye-catching demos of speech interfaces or image recognition are just the most visible demos of the underlying techniques, but those have much broader applications - you can also apply them to a keyboard, a music recommendation system, a network security model or a self-driving car. Maybe.
This is clearly a fundamental change for Google. Narrowly, image and speech recognition mean that it will be able to understand questions better and index audio, images and video better. But more importantly, it will answer questions better, and answer questions that it could never really answer before at all. Hence, aswe saw at Google IO, the company is being recentred on these techniques. And of course, all of these techniques will be used in different ways to varying degrees for different use cases, just as AlphaGo uses a range of different techniques. The thing that gets the attention is ‘Google Assistant - a front-end using voice and analysis of your behaviour to try both to capture questions better and address some questions before they’re asked. But that's just the tip of the spear - the real change is in the quality of understanding of the corpus of data that Google has gathered, and in the kind of queries that Google will be able to answer in all sorts of different products. That's really just at the very beginning right now.
The same applies in different ways to Microsoft, which (having missed mobile entirely) is creating cloud-based tools to allow developers to build their own applications on these techniques, and for Facebook (what is the newsfeed if not a machine learning application?), and indeed for IBM. Anyone who handles lots of data for money, or helps other people do it, will change, and there will be a whole bunch of new companies created around this.
On the other hand, while we have magic we do not have HAL 9000 - we do not have a system that is close to human intelligence (so-called 'general AI'). Nor really do we have a good theory as to what that would mean - whether human intelligence is the sum of techniques and ideas we already have, but more, or whether there is something else. Rather, we have a bunch of tools that need to be built and linked together. I can ask Google or Siri to show me pictures of my dog on a beach because Google and Apple have linked together tools to do that, but I can't ask it to book me a restaurant unless they've added an API integration with Opentable. This is the fundamental challenge for Siri, Google Assistant or any chat bot (as I discussed here) - what can you ask?
This takes us to a whole class of jokes often made about what does and does not count as AI in the first place:
I think a foundational point here is Eric Raymond's rule that a computer should 'never ask the user for any information that it can autodetect, copy, or deduce' - especially, here, deduce. One way to see the whole development of computing over the past 50 years is as removing questions that a computer needed to ask, and adding new questions that it could ask. Lots of those things didn't necessarily look like questions as they're presented to the user, but they were, and computers don't ask them anymore:
This takes me to Apple.
Apple has been making computers that ask you fewer questions since 1984, and people have been complaining about that for just as long - one user's question is another user's free choice (something you can see clearly in the contrasts between iOS and Android today). Steve Jobs once said that the interface for iDVD should just have one button: ‘BURN’. It launched Data Detectors in 1997 - a framework that tried to look at text and extract structured data in a helpful way - appointments, phone numbers or addresses. Today you'd use AI techniques to get there, so was that AI? Or a 'bunch of IF statements'? Is there a canonical list of algorithm that count as AI? Does it matter? To a user who can tap on a number to dial instead of copy & pasting, is that a meaningful question?
So you can say to your phone: 'show me pictures of my dog at the beach' and a speech recognition system turns the audio into text, natural language processing takes the text, works out that this is a photo query and hands it off to your photo app, and your photo app, which has used ML systems to tag your photos with ‘dog’ and 'beach’, runs a database query and shows you the tagged images. Magic.
There are really two things going on here - you’re using voice to fill in a dialogue box for a query, and that dialogue box can run queries that might not have been possible before. Both of these are enabled by machine learning, but they’re built quite separately, and indeed the most interesting part is not the voice but the query. In fact, the important structural change behind being able to ask for ‘Pictures with dogs at the beach’ is not that the computer can find it but that the computer has worked out, itself, how to find it. You give it a million pictures labelled ‘this has a dog in it’ and a million labelled ‘this doesn’t have a dog’ and it works out how to work out what a dog looks like. Now, try that with ‘customers in this data set who were about to churn’, or ‘this network had a security breach’, or ‘stories that people read and shared a lot’. Then try it without labels ('unsupervised' rather than 'supervised' learning).
Today you would spend hours or weeks in data analysis tools looking for the right criteria to find these, and you’d need people doing that work - sorting and resorting that Excel table and eyeballing for the weird result, metaphorically speaking, but with a million rows and a thousand columns. Machine learning offers the promise that a lot of very large and very boring analyses of data can be automated - not just running the search, but working out what the search should be to find the result you want.
That is, the eye-catching demos of speech interfaces or image recognition are just the most visible demos of the underlying techniques, but those have much broader applications - you can also apply them to a keyboard, a music recommendation system, a network security model or a self-driving car. Maybe.
This is clearly a fundamental change for Google. Narrowly, image and speech recognition mean that it will be able to understand questions better and index audio, images and video better. But more importantly, it will answer questions better, and answer questions that it could never really answer before at all. Hence, aswe saw at Google IO, the company is being recentred on these techniques. And of course, all of these techniques will be used in different ways to varying degrees for different use cases, just as AlphaGo uses a range of different techniques. The thing that gets the attention is ‘Google Assistant - a front-end using voice and analysis of your behaviour to try both to capture questions better and address some questions before they’re asked. But that's just the tip of the spear - the real change is in the quality of understanding of the corpus of data that Google has gathered, and in the kind of queries that Google will be able to answer in all sorts of different products. That's really just at the very beginning right now.
The same applies in different ways to Microsoft, which (having missed mobile entirely) is creating cloud-based tools to allow developers to build their own applications on these techniques, and for Facebook (what is the newsfeed if not a machine learning application?), and indeed for IBM. Anyone who handles lots of data for money, or helps other people do it, will change, and there will be a whole bunch of new companies created around this.
On the other hand, while we have magic we do not have HAL 9000 - we do not have a system that is close to human intelligence (so-called 'general AI'). Nor really do we have a good theory as to what that would mean - whether human intelligence is the sum of techniques and ideas we already have, but more, or whether there is something else. Rather, we have a bunch of tools that need to be built and linked together. I can ask Google or Siri to show me pictures of my dog on a beach because Google and Apple have linked together tools to do that, but I can't ask it to book me a restaurant unless they've added an API integration with Opentable. This is the fundamental challenge for Siri, Google Assistant or any chat bot (as I discussed here) - what can you ask?
This takes us to a whole class of jokes often made about what does and does not count as AI in the first place:
- "Is that AI or just a bunch of IF statements?"
- "Every time we figure out a piece of it [AI], it stops being magical; we say, 'Oh, that's just a computation
- "AI is whatever isn't been done yet"
I think a foundational point here is Eric Raymond's rule that a computer should 'never ask the user for any information that it can autodetect, copy, or deduce' - especially, here, deduce. One way to see the whole development of computing over the past 50 years is as removing questions that a computer needed to ask, and adding new questions that it could ask. Lots of those things didn't necessarily look like questions as they're presented to the user, but they were, and computers don't ask them anymore:
- Where do you want to save this file?
- Do you want to defragment your hard disk?
- What interrupt should your sound card use?
- Do you want to quit this application?
- Which photos do you want to delete to save space?
- Which of these 10 search criteria do you want to fill in to run a web search?
- What's the PIN for your phone?
- What kind of memory do you want to run this program in?
- What's the right way to spell that word?
- What number is this page?
- Which of your friends' updates do you want to see?
This takes me to Apple.
Apple has been making computers that ask you fewer questions since 1984, and people have been complaining about that for just as long - one user's question is another user's free choice (something you can see clearly in the contrasts between iOS and Android today). Steve Jobs once said that the interface for iDVD should just have one button: ‘BURN’. It launched Data Detectors in 1997 - a framework that tried to look at text and extract structured data in a helpful way - appointments, phone numbers or addresses. Today you'd use AI techniques to get there, so was that AI? Or a 'bunch of IF statements'? Is there a canonical list of algorithm that count as AI? Does it matter? To a user who can tap on a number to dial instead of copy & pasting, is that a meaningful question?
by Benedict Evans | Read more:
Image: via: