Explaining deep learning: what it is, and what it’s not
Deep learning has the potential to reinvent every industry — including visual effects. But there is huge confusion over exactly what deep learning means. As a starting point, it’s important to define the terms ‘artificial intelligence’ and ‘machine learning’.
Artificial Intelligence (AI), often used mistakenly as a very broad umbrella term, attempts to automate the cognitive functions of human beings—those functions we deem ‘intelligent’. That’s why AI’s first applications were things like winning games of chess and Jeopardy.
Machine Learning (ML) is less ‘intelligent humanoid’ and more ‘well-trained pet’. Here, a computer is fed reams of data in order to make relatively informed decisions. A good example would be training a computer to recognize what a dog looks like, by feeding it thousands of pictures of dogs. It would learn that dogs tend to have fur, and tails, and four legs.
Based on all the data it’s been given, the computer defines ‘rules’ to apply to future inputs. Upon being shown a picture of a dog it’s never seen before, it should stand a pretty good chance of identifying it — based on the context of thousands of images it’s seen before.
But it’s highly dependent on human input. If the only ‘dog’ stimuli you fed it were pictures of black labradors, it wouldn’t recognize a poodle as another dog. ML is data-driven, rather than algorithmic.
The neural network
Here’s where deep learning comes in. While it follows the same data-driven concept – and needs human-provided information in order to learn and create its rules – this data is fed into a ‘box’ called a neural network — a system loosely based on our brains.
Neural networks are made up of lots of nodes with connections between them. In humans, the theory goes that we learn by changing the weight between these nodes so they do or do not fire, depending on the input or stimulus.
Sticking with animal examples, think about how we humans process new information. We probably only need to see one pigeon to recognize most other pigeons — even if they’ve got wonky beaks or different colored feathers — because we have so many learned reference points from everything we’ve experienced throughout our lives.
Neural networks have the ability to make those same connections, although they do need to be fed thousands of times more data than ML.
Deep learning and VFX
Already, deep learning has achieved everything from near-human image classification to improved autonomous driving. In 2016, Google’s DeepMind programme, AlphaGo, got one-up on IBM’s chess success by beating the world Go champion — with Google claiming it had learned in 40 days principles that took humans thousands of years to work out.
And deep learning is set to have a profound impact on the future of the visual effects industry. Imagine leaving a machine to pre-emptively render a scene based on the rules it’s learned from reviewing the last 1,000 box-office hits, or to determine appropriate lighting true to the real-life environments it’s been fed.
Once that’s a reality, the people and businesses that own the training data for neural networks will become hugely influential. Without such data, neural networks won’t be able to learn what’s required of them, so access to it will doubtless be a contentious point in the years to come.