Mobile means that, for the first time, pretty much everyone on earth will have a camera, taking vastly more images than were ever taken on film ('How many pictures?'). This feels like a profound change on a par with, say, the transistor radio making music ubiquitous.
Then, the image sensor in a phone is more than just a camera that takes pictures - it’s also part of new ways of thinking about mobile UIs and services ('Imaging, Snapchat and mobile'), and part of a general shift in what a computer can do ('From mobile first to mobile native').
Meanwhile, image sensors are part of a flood of cheap commodity components coming out of the smartphone supply chain, that enable all kinds of other connected devices - everything from the Amazon Echo and Google Home to an August door lock or Snapchat Spectacles (and of course a botnet of hacked IoT devices). When combined with cloud services and, increasingly, machine learning, these are no longer just cameras or microphones but new endpoints or distribution for services - they’re unbundled pieces of apps. ('Echo, interfaces and friction') This process is only just beginning - it now seems that some machine learning use cases can be embedded into very small and cheap devices. You might train an ‘is there a person in this image?’ neural network in the cloud with a vast image set - but to run it, you can put it on a cheap DSP with a cheap camera, wrap it in plastic and sell it for $10 or $20. These devices will let you use machine learning everywhere, but also let machine learning watch or listen everywhere.
So, smartphones and the smartphone supply chain are enabling a flood of UX and device innovation, with machine learning lighting it all up.
However, I think it’s also worth thinking much more broadly about what computer vision in particular might now mean - thinking about what it might mean that images and video will become almost as transparent to computers as text has always been. You could always search text for ‘dog’ but could never search pictures for a dog - now you’ll be able to do both, and, further, start to get some understanding of what might actually be happening.
We should expect that every image ever taken can be searched or analyzed, and some kind of insight extracted, at massive scale. Every glossy magazine archive is now a structured data set, and so is every video feed. With that incentive (and that smarthone supply chain) far more images and video will be captured.
So, some questions for the future:
Now, suppose you buy the last ten years’ issues of Elle Decoration on eBay and drop them into just the right neural networks, and then give that system a photo of your living room and ask which lamps it recommends? All those captioned photos, and the copy around them, are training data. And yet, if you don’t show the user an actual photo from that archive, just a recommendation based on it, you probably don’t need to pay the original print publisher itself anything at all. (Machine learning will be fruitful grounds for IP lawyers.) We don’t have this yet, but we know, pretty much, how we might do it. We have a roadmap to recognize some kind of preferences, automatically, at scale.
The key thing here is that the nice attention-grabbing demos of computer vision that recognize a dog or a tree, or a pedestrian, are just the first, obvious use cases for a fundamental new capability - to read images. And not just to read them the way humans can, but to read a billion and see the patterns. Among many other things, that has implications for a lot of retail, including parts not really affected by Amazon, and indeed for the $500bn spent every year on advertising.
Then, the image sensor in a phone is more than just a camera that takes pictures - it’s also part of new ways of thinking about mobile UIs and services ('Imaging, Snapchat and mobile'), and part of a general shift in what a computer can do ('From mobile first to mobile native').
Meanwhile, image sensors are part of a flood of cheap commodity components coming out of the smartphone supply chain, that enable all kinds of other connected devices - everything from the Amazon Echo and Google Home to an August door lock or Snapchat Spectacles (and of course a botnet of hacked IoT devices). When combined with cloud services and, increasingly, machine learning, these are no longer just cameras or microphones but new endpoints or distribution for services - they’re unbundled pieces of apps. ('Echo, interfaces and friction') This process is only just beginning - it now seems that some machine learning use cases can be embedded into very small and cheap devices. You might train an ‘is there a person in this image?’ neural network in the cloud with a vast image set - but to run it, you can put it on a cheap DSP with a cheap camera, wrap it in plastic and sell it for $10 or $20. These devices will let you use machine learning everywhere, but also let machine learning watch or listen everywhere.
So, smartphones and the smartphone supply chain are enabling a flood of UX and device innovation, with machine learning lighting it all up.
However, I think it’s also worth thinking much more broadly about what computer vision in particular might now mean - thinking about what it might mean that images and video will become almost as transparent to computers as text has always been. You could always search text for ‘dog’ but could never search pictures for a dog - now you’ll be able to do both, and, further, start to get some understanding of what might actually be happening.
We should expect that every image ever taken can be searched or analyzed, and some kind of insight extracted, at massive scale. Every glossy magazine archive is now a structured data set, and so is every video feed. With that incentive (and that smarthone supply chain) far more images and video will be captured.
So, some questions for the future:
- Every autonomous car will, necessarily, capture HD 360 degree video whenever it’s moving. Who owns that data, what else can you do with it beyond driving and how do our ideas of privacy adjust?
- A retailer can deploy cheap commodity wireless HD cameras thoughout the store, or a mall operator the mall, and finally know exactly what track every single person entering took through the building, and what they looked at, and then connect that to the tills for purchase data. How much does that change (surviving) retail?
- What happens to the fashion industry when half a dozen static $100 cameras can tell you everything that anyone in Shoreditch wore this year - when you can trace a trend through social and street photography from start to the mass-market, and then look for the next emerging patterns?
- What happens to ecommerce recommendations when a system might be able to infer things about your taste from your Instagram or Facebook photos, without needing tags or purchase history - when it can see your purchase history in your selfies?
Now, suppose you buy the last ten years’ issues of Elle Decoration on eBay and drop them into just the right neural networks, and then give that system a photo of your living room and ask which lamps it recommends? All those captioned photos, and the copy around them, are training data. And yet, if you don’t show the user an actual photo from that archive, just a recommendation based on it, you probably don’t need to pay the original print publisher itself anything at all. (Machine learning will be fruitful grounds for IP lawyers.) We don’t have this yet, but we know, pretty much, how we might do it. We have a roadmap to recognize some kind of preferences, automatically, at scale.
The key thing here is that the nice attention-grabbing demos of computer vision that recognize a dog or a tree, or a pedestrian, are just the first, obvious use cases for a fundamental new capability - to read images. And not just to read them the way humans can, but to read a billion and see the patterns. Among many other things, that has implications for a lot of retail, including parts not really affected by Amazon, and indeed for the $500bn spent every year on advertising.
by Benedict Evans | Read more:
Image: via: