A young farmer sees a cow walking slowly in the pen, ears drooping. It does not move when she approaches her. There seems to be something wrong, but it is beyond her to figure out whether the cow is ill or merely hungry. Expertise is urgent – but there is no human expert around. However, while specialists may be in short supply, smart sensors are getting better by the day – as are the algorithms that may make sense of the data that they collect.
Recent developments in the field of AI and machine learning were presented at the seminar The Societal Impact of AI at the Institute for Futures Studies. A particularly important development concerned the capabilities of deep learning algorithms, a type of machine learning that uses raw input data, such as all the pixels in a digital image, and multiple processing layers of artificial neural networks, in which each contributes in a distinct way to solve a problem. Intricate structures in the data can be identified by means of backpropagation algorithms that link the representation in each layer to that of the next.1 In the case of image classification (is this a cat?) one layer may, for example, address the shape of the edges of the animal (round or thin?), and another the texture (furry or smooth?).
By and large, there are three types of machine learning:
1) Supervised learning, in which the algorithm has been trained on labeled data, typically by humans who, for example, have determined whether there is a cat on a thousand digital photos.
2) Unsupervised learning, typically used to identify clusters in data, for example, fishes may be lumped together in one group and mammals with fur in another.
3) Reinforcement learning, goal-oriented algorithms that learn sequentially by means of trial-and-error actions to maximize some type of pre-defined reward. A key element of the latter is the trade-off between exploration of new strategies and exploitation of current knowledge.2
Deep learning algorithms that have been incorporated in reinforcement algorithms have recently accomplished very advanced tasks. For example, starting its training from random play, the AlphaZero algorithm used deep convolutional neural networks trained by reinforcement learning from self-play to beat a world champion program in the games of chess, shogi (Japanese chess) and Go.3
Conditions for deep reinforcement learning
The conditions for well-functioning deep neural networks in the context of reinforcement learning were specified by a co-developer of the AlphaZero algorithm, Timothy Lillicrap, Staff Research Scientist at Google DeepMind. The criteria were: 4
- A1: A well-defined reward, i.e., a narrow task with a clear objective, for example to win at a game of chess.
- A2: Lots of relevant data, such as millions of chess games played and stored digitally.
- A3: Well-specified episodes, where an episode could be playing a game of chess until there is a winner.
- A4: The possibility to try different actions so that the algorithm could explore different strategies, for example to win a game, and develop rules to solve problems that may occur along the way.
Hence, even though the tasks that the deep reinforcement learning algorithms can accomplish are extraordinary, the conditions for when this is possible are quite narrow – motivating the term narrow AI.
Conditions for human expertise
In this context, one may note that the conditions for human expertise are also quite narrow, as specified by the psychologist and economist Daniel Kahneman, awarded the Nobel Memorial Prize in Economic Sciences in 2002. In his best-selling book, Thinking, Fast and Slow, Kahneman draws attention to research regarding the conditions for when professional intuition is reliable:5
- H1: Human expertise can only develop in a world that is sufficiently regular to be predictable, i.e., which has rules to be picked up.
- H2: There must be opportunities to learn the rules through feedback from reality regarding weather the guesses were correct or not.
- H3: Feedback must be rapid and unequivocal.
- H4: There must be many such opportunities, since learning the rules takes a long time.
Are they analogous?
It appears that the conditions for when you can trust intuition guesses by human experts align quite well with the conditions that enable deep reinforcement learning algorithms to function well:
First, H1: a regular environment links to A3: well-specified episodes, since the latter implies the presence of regularity; the environment cannot be chaotic for humans or AI-algorithms to learn well and develop expertise.
Second, H2: opportunities to learn through feedback links to A4: the possibility to explore different actions to find a good strategy. Having many opportunities to learn undoubtedly concerns having the possibility to explore different actions and receiving feedback.
This also concerns the third pair, H3: rapid feedback and A1: a well-defined reward. Both conditions address the importance of an environment that gives unambiguous feedback. If a reward is not rapid, it may not be possible to identify it as feedback that regards a specific strategy. For a successful try, rapid feedback implies a well-defined reward.
Fourth, H4: many opportunities to learn, corresponds well with A2: lots of relevant data. It seems that for both human experts and deep reinforcement algorithms learning cannot happen instantaneously, but only after many tries with subsequent feedback.
Kahneman thus highlights that the relationship between objectively identifiable cues and outcomes must be clear, i.e., that decisions are made in high-validity task environments.5 According to Kahneman and Klein, situations in which experts operate in high-validity environments might include an experienced fireground commander’s assessment of a building’s stability, or a nurse’s intuitions about whether a preterm baby has developed an infection; on the contrary, a stock market traders’ forecasting would generally take place in a low-validity environment.6
The coherence between the conditions for human and deep reinforcement learning expertise indicates that the latter may have the capabilities to mimic (or improve) human expert guesses in high-validity environments. Developing expertise generally takes a long time for humans, but not for computers that only need many runs to improve. This suggests that the guesses that human experts make based on intuition might have been made with equal accuracy using deep reinforcement learning algorithms – given that the data that they had collected to develop that expertise would have been accessible to the algorithms.
An important difference between algorithms and human experts regard the type of information that can be used to identify the cues needed to make good decisions. Typically, algorithms have access to digital sound and image data, as well as maybe some medical data, such as temperature and blood pressure. However, other cues that humans can pick up, such as smell and sensation, are often not accessible to algorithms. Thus, human expert guesses based on such data are likely to be much better than those of a deep reinforcement learning algorithm.
Also, note that this discussion concerns decision-making and not the implementation of decisions, such as entering a building on fire. (The field of robotics faces very different challenges than algorithms when it comes to mimicking humans, for example regarding dexterity.) In situations where decision-making occurs almost simultaneously as the implementation of those decisions, algorithms would have to collect the data instantaneously to make good decisions, and then feed the recommendations to a human. Possibly, deep reinforcement learning algorithms might guide decision-making in precarious situations that call for human experts who may not be available, for example by making suggestions to a newly hired firefighter. (It is imaginable that a firefighter could wear a go-pro camera sending image and sound information to an algorithm that analyzes the data in real time and makes recommendations through an earpiece.)
A crucial difference between humans and AI concerns the training. Humans on their way to becoming experts may try different strategies by consulting textbooks or more experienced colleagues; however, it is hard to imagine a situation in which it would be acceptable for an algorithm to find suitable strategies through trial and error in reality. Training would have to be conducted virtually and then transferred to the real world (rarely straightforward).
Another important challenge for deep reinforcement learning regards the reward function – a healthy cow may not only be eating plenty, but also walking steadily and milking well – so what exactly is the algorithm’s goal? This could potentially be addressed with a deep inverse reinforcement algorithm, in which the reward function is learned from observations; monitoring a large group of cows over time – recording the way they walk and tilt their heads, as related to their eating, drinking and milking habits, for example – may be used to train an AI-algorithm.7,8 This may then be able to pick up cues for illness for each individual cow.9 So, it seems that an algorithm may develop expertise that in some ways resembles an experienced farmer’s. This was addressed further in a recent article in the New Yorker.10
If Lillicrap’s and Kahneman’s propositions hold, AI-based assessments might become more reliable than human expert guesses – assuming that the underlying cues have been digitized and that there are ample opportunities for training.
It appears that an AI might help that anxious farmer as she diagnoses the cow with the drooping ears.
- LeCun, Y., Bengio, Y., Hinton, G. Deep learning. Nature 521, 436–444 (2015) https://doi.org/10.1038/nature14539
- Kaelbling, L.P., Littman, M.L., and Moore, A.W. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4, 237-285 (1996) https://doi.org/10.1613/jair.301
- Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., Hassabis, D. A general reinforcement learning algorithm that masters chess, shogi and Go through self-play. Science 362, 1140-1144 (2018) https://doi.org/10.1126/science.aar6404
- Lillicrap, T., Seminar: Games and Beyond, The Power of Reinforcement Learning. Institute for Futures Studies, Stockholm, Sweden. Feb. 8, 2019
- Kahneman, D. Thinking fast and Slow. Farrar, Straus and Giroux (2011). ISBN: 978-0374275631
- Kahneman, D. and Klein, G. American Psychologist 515 2009. American Psychological Association 64, 515–526 (2009) https://doi.org/10.1037/a0016755
- Irpan, A. (2018) Deep Reinforcement Learning Doesn't Work Yet. Feb 14, 2018. Available: https://www.alexirpan.com/2018/02/14/rl-hard.html# [Accessed Sep. 23 2019]
- Finn, C., Levine, S., Abbeel, P. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization. Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP 48 (2016). Available: https://arxiv.org/pdf/1603.00448.pdf [Accessed Sep. 23 2019]
- Ng, A. & Russell, S. Algorithms for Inverse Reinforcement Learning in Proc. 17th International Conference on Machine Learning (2000). Available: http://ai.stanford.edu/~ang/papers/icml00-irl.pdf [Accessed Sep. 23 2019]
- Owen, D. Should We Be Worried About Computerized Facial Recognition? The New Yorker. Dec. 10, 2018. Available: https://www.newyorker.com/magazine/2018/12/17/should-we-be-worried-about-computerized-facial-recognition [Accessed Sep. 23 2019]