I recently came across a blog post by Francois Chollet, the creator of Keras, where he explores the limitations of deep learning methods. It is an extremely informative blog piece, which I would recommend readers to go through before continuing further.
I, personally, am guilty of over estimating the capabilities of deep learning for machine learning tasks. Theoretically, a recurrent neural network can be considered as a Turing Complete machine. To put it in a simpler phrase, every Turing Machine can be modeled by a recurrent neural network. In other words, any algorithm can be modeled using a recurrent neural network. Does this mean we don't need to learn algorithmic programming anymore and just throw enough sample training data to the neural net and let the net learn the algorithm? This is where the catch is. Neural networks are universal approximators. They can never leverage precision decision boundaries like an actual algorithm. They approximate the underlying function in order to produce outputs of varying precision. The phrase "varying precision" is of utmost importance since it can also represent a machine that is infinitely bad.
In order to comprehend why deep learning is over hyped and may not perform up to our expectations, we need to dive deeper into how a neural net does what it does. Like Francios rightly pointed out, we tend to think of a neural net as a replica of a human brain, both in terms of form and functionality. This can be attributed to the similar nomenclature used in these two contexts. For example, both the brains and the neural nets consists of neurons that are central to the decision making process, where different neurons get fired in response to different data points (stimuli, for brain). But this is where the similarity ends. Human neurons are much complex and have the ability to handle a diverse set of stimuli. Neural nets neurons on the other hand are simple differentiable functions which just applies a non linearity to the input data. Although the neural nets are modeled on human brains and are ideally expected to mimic its functionality, the status quo suggests that we are way too primitive to achieve such performance.
Deep learning methods are constrained in the geometric space. It is all about mapping one set of data in one vector space to another set of data in another vector space, one point at a time. There are two requisites for any deep learning model to perform well. One, availability of large (and by large, I mean very large) densely sampled, accurate data and the other is the existence of a learnable function. The requirement of a large set of training data is understood well by the public (thanks to the widespread media coverage). The other requirement of having a learnable function is more obscure and sometimes eludes the most seasoned deep learning engineer. This fact can be demonstrated by the below par performance of a neural net in as simple a task as sorting a list of numbers. Even after throwing millions of data points at it, it would still not perform well. Contrast this with the ability of a human brain to map the sorting function given a few examples.
This weakness of the neural nets has been brought to the forefront in recent times with importance being given to the adversarial examples, where the neural nets can be fooled completely by addition of a different overlay over the input image. Although the images looks identical to any human being, the neural net is unable to comprehend each of them. This highlights the fact that humans and neural nets comprehend an image in a completely contrasting manner. This should drive home the fact that we should never expect the neural net to perform like a human brain (at least not in the near future).
Francois also talks about Local and Extreme Generalizations. Deep learning methods are pretty good at local generalizations. They excel at mapping a narrow set of data points from one vector space to another vector space. When faced with a new data point far from the already encountered data points, the neural nets perform unpredictably. Human cognition, on the other hand, is adept at extreme generalizations, capable of applying cognition from one area into another seamlessly.
The status quo is such that, we are trying to model a bike cruiser using lego blocks with a motor attached. Although the intentions and the end goal is to have a cruiser, we cannot expect a lego model to replace it. Hence, we should be cautious while estimating the performance of any deep learning method.
I, personally, am guilty of over estimating the capabilities of deep learning for machine learning tasks. Theoretically, a recurrent neural network can be considered as a Turing Complete machine. To put it in a simpler phrase, every Turing Machine can be modeled by a recurrent neural network. In other words, any algorithm can be modeled using a recurrent neural network. Does this mean we don't need to learn algorithmic programming anymore and just throw enough sample training data to the neural net and let the net learn the algorithm? This is where the catch is. Neural networks are universal approximators. They can never leverage precision decision boundaries like an actual algorithm. They approximate the underlying function in order to produce outputs of varying precision. The phrase "varying precision" is of utmost importance since it can also represent a machine that is infinitely bad.
In order to comprehend why deep learning is over hyped and may not perform up to our expectations, we need to dive deeper into how a neural net does what it does. Like Francios rightly pointed out, we tend to think of a neural net as a replica of a human brain, both in terms of form and functionality. This can be attributed to the similar nomenclature used in these two contexts. For example, both the brains and the neural nets consists of neurons that are central to the decision making process, where different neurons get fired in response to different data points (stimuli, for brain). But this is where the similarity ends. Human neurons are much complex and have the ability to handle a diverse set of stimuli. Neural nets neurons on the other hand are simple differentiable functions which just applies a non linearity to the input data. Although the neural nets are modeled on human brains and are ideally expected to mimic its functionality, the status quo suggests that we are way too primitive to achieve such performance.
Deep learning methods are constrained in the geometric space. It is all about mapping one set of data in one vector space to another set of data in another vector space, one point at a time. There are two requisites for any deep learning model to perform well. One, availability of large (and by large, I mean very large) densely sampled, accurate data and the other is the existence of a learnable function. The requirement of a large set of training data is understood well by the public (thanks to the widespread media coverage). The other requirement of having a learnable function is more obscure and sometimes eludes the most seasoned deep learning engineer. This fact can be demonstrated by the below par performance of a neural net in as simple a task as sorting a list of numbers. Even after throwing millions of data points at it, it would still not perform well. Contrast this with the ability of a human brain to map the sorting function given a few examples.
This weakness of the neural nets has been brought to the forefront in recent times with importance being given to the adversarial examples, where the neural nets can be fooled completely by addition of a different overlay over the input image. Although the images looks identical to any human being, the neural net is unable to comprehend each of them. This highlights the fact that humans and neural nets comprehend an image in a completely contrasting manner. This should drive home the fact that we should never expect the neural net to perform like a human brain (at least not in the near future).
Francois also talks about Local and Extreme Generalizations. Deep learning methods are pretty good at local generalizations. They excel at mapping a narrow set of data points from one vector space to another vector space. When faced with a new data point far from the already encountered data points, the neural nets perform unpredictably. Human cognition, on the other hand, is adept at extreme generalizations, capable of applying cognition from one area into another seamlessly.
The status quo is such that, we are trying to model a bike cruiser using lego blocks with a motor attached. Although the intentions and the end goal is to have a cruiser, we cannot expect a lego model to replace it. Hence, we should be cautious while estimating the performance of any deep learning method.
No comments:
Post a Comment