The Creativity Code: How AI is learning to write, paint and think
Marcus du Sautoy
As a species, we have an extraordinary ability to create works of art that elevate, expand and transform what it means to be human. The novels of Henry James can communicate the inner world of one human being to another. The music of Wagner or Schubert takes us on an emotional rollercoaster ride as we give ourselves up to their sublime sounds.These are the expressions of what Marcus du Sautoy calls ‘the creativity code’. Yet some believe that the new developments in AI and machine learning are so sophisticated that they can learn what it means to be human – that they can crack the code.• Technology has always allowed us to extend our understanding of being human. But will the new tools of AI allow to us to create in different ways?• Could recent developments in AI and machine learning also mean that it is no longer just human beings who can create art?• And creativity, like consciousness, is one of those words that is hard to pin down: what is it that we are challenging these machines to do?In The Creativity Code, Marcus du Sautoy examines what these new developments might mean, for both the creative arts and his own subject, mathematics. From the Turing test to AlphaGo, are there limits to what algorithms can achieve, or might they be able to perfectly mimic human creativity? And what’s more, could they help Marcus to see more deeply into the complex mathematical problems with which he so often wrestles?
Copyright (#u72b86391-1493-544a-b3e3-d400fef132cc)
4th Estate
An imprint of HarperCollinsPublishers
1 London Bridge Street
London SE1 9GF
www.4thEstate.co.uk (http://www.4thEstate.co.uk)
First published in Great Britain by 4th Estate in 2019
Copyright © Marcus du Sautoy 2019
Marcus du Sautoy asserts the moral right to be identified as the author of this work.
All reasonable efforts have been made by the author and the publisher to trace the copyright holders of the images and material quoted in this book. In the event that the author or publisher are contacted by any of the untraceable copyright holders after the publication of this book, the author and the publisher will endeavour to rectify the position accordingly.
Diagrams redrawn by Martin Brown
Cover image © Hands of God and Adam, detail from The Creation of Adam, from the Sistine Ceiling, 1511 (fresco) (pre restoration), Buonarroti, Michelangelo (1475-1564) / Vatican Museums and Galleries, Vatican City / Bridgeman Images
Author photo © Oxford University Images/Joby Sessions
A catalogue record for this book is available from the British Library.
All rights reserved under International and Pan-American Copyright Conventions. By payment of the required fees, you have been granted the non-exclusive, non-transferable right to access and read the text of this e-book on-screen. No part of this text may be reproduced, transmitted, down-loaded, decompiled, reverse engineered, or stored in or introduced into any information storage and retrieval system, in any form or by any means, whether electronic or mechanical, now known or hereinafter invented, without the express written permission of HarperCollins.
Source ISBN: 9780008288150
Ebook Edition © March 2019 ISBN: 9780008288167
Version: 2019-01-28
Dedication (#u72b86391-1493-544a-b3e3-d400fef132cc)
To Shani,for all her love and support,creativity and intelligence
CONTENTS
Cover (#u4523cef0-1363-5732-b4e4-eec0b07a9b1b)
Title Page (#u956d94ca-fc3e-5914-9175-5c18be58d494)
Copyright
Dedication
1 The Lovelace Test (#u53696ddf-5765-5a89-8f99-b4605abb022b)
2 Creating Creativity (#u52db45a9-4ee2-52a6-82d0-a1166dccdd75)
3 Ready, Steady, Go (#ud21f1ff7-8fbc-5b47-8766-b0ef5fc3579c)
4 Algorithms, the Secret to Modern Life (#ufa44cb7c-63db-51ee-ba4a-32a4004846d5)
5 From Top Down to Bottom Up (#uab504d4a-ddb3-5f7a-a950-ee8b0f00d8d3)
6 Algorithmic Evolution (#litres_trial_promo)
7 Painting by Numbers (#litres_trial_promo)
8 Learning from the Masters (#litres_trial_promo)
9 The Art of Mathematics (#litres_trial_promo)
10 The Mathematician’s Telescope (#litres_trial_promo)
11 Music: The Process of Sounding Mathematics (#litres_trial_promo)
12 The Songwriting Formula (#litres_trial_promo)
13 DeepMathematics (#litres_trial_promo)
14 Language Games (#litres_trial_promo)
15 Let AI Tell You a Story (#litres_trial_promo)
16 Why We Create: A Meeting of Minds (#litres_trial_promo)
Footnote (#litres_trial_promo)
Illustrations (#litres_trial_promo)
Further Reading (#litres_trial_promo)
Index (#litres_trial_promo)
Acknowledgements (#litres_trial_promo)
By the same author (#litres_trial_promo)
About the Publisher (#litres_trial_promo)
1 (#ulink_607b5c3f-44e0-5c84-86c8-c83086da4b8c)
THE LOVELACE TEST (#ulink_607b5c3f-44e0-5c84-86c8-c83086da4b8c)
Works of art make rules; rules do not make works of art.
Claude Debussy
The machine was a thing of beauty. Towers of gears with numbers on their teeth pinned to rods driven by a handle that you turned. The seventeen-year-old Ada Byron was transfixed as she cranked the handle of Charles Babbage’s machine to watch it crunch numbers, calculate squares and cubes and even square roots. Byron had always had a fascination with machines, fanned by the tutors her mother had been happy to provide.
Studying Babbage’s plans some years later for the Analytical Engine, it dawned on Ada, now married to the Earl of Lovelace, that this was more than just a number cruncher. She began to record what it might be capable of. ‘The Analytical Engine does not occupy common ground with mere “calculating machines.” It holds a position wholly its own, and the considerations it suggests are more interesting in their nature.’
Ada Lovelace’s notes are now recognised as the first inroads into the creation of code. That kernel of an idea has blossomed into the artificial intelligence revolution that is sweeping the world today, fuelled by the work of pioneers like Alan Turing, Marvin Minsky and Donald Michie. Yet Lovelace was cautious as to how much any machine could achieve: ‘It is desirable to guard against the possibility of exaggerated ideas that might arise as to the powers of the Analytical Engine. The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we order it to perform.’ Ultimately, she believed, it was limited: you couldn’t get more out than you’d put in.
This idea was a mantra of computer science for many years. It is our shield against the fear that we will set in motion something we can’t control. Some have suggested that to program a machine to be artificially intelligent, you would first have to understand human intelligence.
What is going on inside our heads remains a mystery, but in the last few years a new way of thinking about code has emerged: a shift from a top-down attitude to programming to a bottom-up effort to get the computer to chart its own path. It turns out you don’t have to solve intelligence first. You can allow algorithms to roam the digital landscape and learn just as a child does. Today’s code created by machine learning is making surprisingly insightful moves, spotting previously undiscovered features in medical images, and investing in shrewd trades on the stock market. This generation of coders believes it can finally prove Ada Lovelace wrong: that you can get more out than you programmed in.
Yet there is still one realm of human endeavour that we believe the machines will never be able to touch, and that is creativity. We have this extraordinary ability to imagine and innovate and to create works of art that elevate, expand and transform what it means to be human. These are the outpourings of what I call the human code.
This is code that we believe depends on being human because it is a reflection of what it means to be human. Mozart’s requiem allows us to contemplate our own mortality. Witnessing a performance of Othello gives us the chance to navigate our emotional landscape of love and jealousy. A Rembrandt portrait seems to capture so much more than just what the sitter looks like. How can a machine ever hope to replace or even to compete with Mozart, Shakespeare or Rembrandt?
I should declare at the outset that my field of reference is dominated by the artistic output of the West. This is the art I know, this is the music I have been brought up on, the literature that dominates my reading. It would be fascinating to know if art from other cultures might be more amenable to being captured by the output of a machine, but my suspicion is that there is a universal challenge here that transcends cultural boundaries. And so although I make some apology for my Western-focused viewpoint, I think it will provide a suitable benchmark for the creativity of our digital rivals.
Of course, human creativity extends beyond the arts: the molecular gastronomy of the Michelin-star chef Heston Blumenthal; the football trickery of the Dutch striker Johan Cruyff; the curvaceous buildings of Zaha Hadid; the invention of the Rubik’s cube by the Hungarian Ernö Rubik. Even the creation of code to make a game like Minecraft should be regarded as part of some of the great acts of human creativity.
More unexpectedly creativity is an important part of my own world of mathematics. One of the things that drives me to spend hours at my desk conjuring up equations and penning proofs is the allure of creating something new. My greatest moment of creativity, one that I go back to again and again, is the time I conceived of a new symmetrical object. No one knew this object was possible. But after years of hard work and a momentary flash of white-hot inspiration I wrote on my yellow notepad the blueprint for this novel shape. That sheer buzz of excitement is the allure of creativity.
But what do we really mean by this shape-shifting term? Those who have tried to pin it down usually circle around three ideas: creativity is the drive to come up with something that is new and surprising and that has value.
It turns out it’s easy to make something new. I can get my computer to churn out endless proposals for new symmetrical objects. It’s the surprise and value that are more difficult to produce. In the case of my symmetrical creation, I was legitimately surprised by what I’d cooked up, and so were other mathematicians. No one was expecting the strange new connection I’d discovered between this symmetrical object and the unrelated subject of number theory. The fact that this object suggested a new way of understanding an area of mathematics that is full of unsolved problems is what gave it value.
We all get sucked into patterns of thought. We think we see how the story will evolve and then suddenly we are taken in a new direction. This element of surprise makes us take notice. It is probably why we get a rush when we encounter an act of creativity, either our own or someone else’s.
But what gives something value? Is it simply a question of price? Does it have to be recognised by others? I might value a poem or a painting I’ve created but my conception of its value is unlikely to be shared more widely. A surprising novel with lots of plot twists could be of relatively little value. But a new and surprising approach to storytelling or architecture or music that begins to be adopted by others and that changes the way we see or experience things will generally be recognised as having value. This is what Kant refers to as ‘exemplary originality’, an original act that becomes an inspiration for others. This form of creativity has long been thought to be uniquely human.
And yet all of these expressions of creativity are at some level the products of neuronal and chemical activity. This is the human code that millions of years of evolution has honed inside our brains. As you begin to unpick the creative outpourings of the human species you start to see that there are rules at the heart of the creative process. Could our creativity be more algorithmic and rule-based than we might want to acknowledge?
The challenge of this book is to push the new AI to its limits to see whether it can match or even surpass the marvels of our human code. Can a machine paint, compose music or write a novel? It may not be able to compete with Mozart, Shakespeare or Picasso, but could it be as creative as our children when they write a story or paint a scene? By interacting with the art that moves us and understanding what distinguishes it from the mundane and bland, could a machine learn to be creative? Not only that, could it extend our own creativity and help us see opportunities we are missing?
Creativity is a slippery word that can be understood in many different ways in different circumstances. I will mostly focus on the challenge of creativity in the arts, but that does not mean this is the only sort of creativity possible. My daughters are being creative when they build their castles in Lego. My son is heralded as a creative midfielder when he leads his football team to victory. We can solve everyday problems creatively, and run organisations creatively. And, as I shall illustrate, mathematics is a much more creative subject than many recognise, a creativity that actually shares much in common with the creative arts.
The creative impulse is a key part of what distinguishes humans from other animals and yet we often let it stagnate inside us, falling into the trap of becoming slaves to our formulaic lives, to routine. Being creative requires a jolt to take us out of the smooth paths we carve out each day. That is where a machine might help: perhaps it could give us that jolt, throw up a new suggestion, stop us from simply repeating the same algorithm each day. The machines might ultimately help us, as humans, to behave less like machines.
You may ask why a mathematician is offering to take you on this journey. The simple answer is that AI, machine learning, algorithms and code are all mathematical at heart. If you want to understand how and why the algorithms that control modern life are doing what they do, you need to understand the mathematical rules that underpin them. If you don’t, you will be pushed and pulled around by the machines.
AI is challenging us to the core as it reveals how many of the tasks humans engage in can be done equally well, if not better, by machines. But rather than focus on a future of driverless cars and computerised medicine, this book sets out to explore whether these algorithms can compete meaningfully with the power of the human code. Can computers be creative? What does it mean to be creative? How much of our emotional response to art is a product of our brains responding to pattern and structure? These are some of the things we will explore.
But this isn’t just an interesting intellectual challenge. Just as the artistic output of humans allows us to get some insight into the complex human code that runs our brains, we will see how the art generated by computers provides a surprisingly powerful way to understand how the code is working. One of the challenges of code emerging in this bottom-up fashion is that the coders often don’t really understand how the final code works. Why is it making that decision? The art it creates may provide a powerful lens through which to gain access to the subconscious decisions of the new code. And it may also reveal limitations and dangers that are inherent in creating code that we don’t fully understand.
There is another, more personal, reason for wanting to go on this journey. I am going through a very existential crisis. I have found myself wondering, with the onslaught of new developments in AI, if the job of mathematician will still be available to humans in decades to come. Mathematics is a subject of numbers and logic. Isn’t that what a computer does best?
Part of my defence against the computers knocking on the door of the department, wanting their place at the table, is that as much as mathematics is about numbers and logic, it is a highly creative subject, involving beauty and aesthetics. I want to argue in this book that the mathematics we share in our seminars and journals isn’t just the result of humans cranking a mechanical handle. Intuition and artistic sensitivity are important qualities for making a good mathematician. Surely these are traits that can never be programmed into a machine. Or can they?
This is why, as a mathematician, I am attentive to how successful the new AI is being in gaining entry to the world’s galleries, concert halls and publishing houses. The great German mathematician Karl Weierstrass once wrote: ‘a mathematician that is not something of a poet will never be a true mathematician.’ As Ada Lovelace perfectly encapsulates, you need a bit of Byron as much as Babbage. Although she thought machines were limited, Lovelace began to realise the potential of these machines of cogs and gears to express a more artistic side of its character:
It might act upon other things besides number … supposing, for instance, that the fundamental relations of pitched sounds in the science of harmony and of musical composition were susceptible of such expression and adaptations, the engine might compose elaborate and scientific pieces of music of any degree of complexity or extent.
Yet she believed that any act of creativity would lie with the coder, not the machine. Is it possible to shift the weight of responsibility more towards the code? The current generation of coders believes it is.
At the dawn of AI, Alan Turing famously proposed a test to measure intelligence in a computer. I would now like to propose a new test: the Lovelace Test. To pass the Lovelace Test, an algorithm must originate a creative work of art such that the process is repeatable (i.e. it isn’t the result of a hardware error) and yet the programmer is unable to explain how the algorithm produced its output. This is what we are challenging the machines to do: to come up with something new, surprising and of value. For a machine to be deemed truly creative requires one extra step: its contribution should be more than an expression of the coder’s creativity or that of the person who built the data set. That is the challenge Ada Lovelace believed was insurmountable.
2 (#ulink_7735a87e-5b24-5812-91fc-f15485a226d2)
CREATING CREATIVITY (#ulink_7735a87e-5b24-5812-91fc-f15485a226d2)
The chief enemy of creativity is good sense.
Pablo Picasso
The value placed on creativity in modern times has led to a range of writers and thinkers trying to articulate what it is, how to stimulate it, and why it is important. It was while sitting on a committee at the Royal Society assessing what impact machine learning was likely to have on society in the coming decades that I first encountered the theories of the cognitive scientist Margaret Boden. Her ideas on creativity struck me as the most relevant when it came to addressing or evaluating creativity in machines.
Boden is an original thinker who over the decades has managed to fuse many different disciplines: philosopher, psychologist, physician, AI expert and cognitive scientist. In her eighties now, with white hair flying like sparks and an ever-active brain, she is enjoying engaging enthusiastically with the prospect of what these ‘tin cans’, as she likes to call computers, might be capable of. To this end, she has identified three different types of human creativity.
Exploratory creativity involves taking what is already there and exploring its outer edges, extending the limits of what is possible while remaining bound by the rules. Bach’s music is the culmination of a journey Baroque composers embarked on to explore tonality by weaving together different voices. His preludes and fugues push the boundaries of what is possible before breaking the genre open and entering the Classical era of Mozart and Beethoven. Renoir and Pissarro reconceived how we could visualise nature and the world around us, but it was Claude Monet who really pushed the boundaries, painting his water lilies over and over until his flecks of colour dissolved into a new form of abstraction.
Mathematics revels in this type of creativity. The classification of finite simple groups is a tour de force of exploratory creativity. Starting from the simple definition of a group of symmetries – a structure defined by four simple axioms – mathematicians spent 150 years producing a list of every conceivable element of symmetry, culminating in the discovery of the Monster Symmetry Group, which has more symmetries than there are atoms in the Earth and yet fits into no pattern of other groups. This form of mathematical creativity involves pushing the limits while adhering to the rules of the game. It is like the explorer who thrusts into the unknown but is still bound by the limits of our planet.
Boden believes that exploration accounts for 97 per cent of human creativity. This is the sort of creativity that computers excel at: pushing a pattern or set of rules to the extremes is perfect for a computational mechanism that can perform many more calculations than the human brain. But is it enough? When we think of truly original creative acts, we generally imagine something more utterly unexpected.
The second sort of creativity involves combination. Think of how an artist might take two completely different constructs and seek to combine them. Often the rules governing one world will suggest an interesting new framework for the other. Combination is a very powerful tool in the realm of mathematical creativity. The eventual solution of the Poincaré Conjecture, which describes the possible shapes of our universe, was arrived at by applying very different tools to understand flow over surfaces. It was the creative genius of Grigori Perelman which realised that the way a liquid flows over a surface could unexpectedly help to classify the possible surfaces that might exist.
My own research takes tools from number theory to understand primes and applies them to classify possible symmetries. The symmetries of geometric objects at first sight don’t look anything like numbers. But applying the language that has helped us to navigate the mysteries of the primes and replacing primes by symmetrical objects has revealed surprising new insights into the theory of symmetry.
The arts have also benefited greatly from this form of cross-fertilisation. Philip Glass took ideas he learned from working with Ravi Shankar and used them to create the additive process that is at the heart of his minimalist music. Zaha Hadid combined her knowledge of architecture with her love of the pure forms of the Russian painter Kasimir Malevich to create a unique style of curvaceous buildings. In cooking, too, creative master chefs have fused cuisines from opposite ends of the globe.
There are interesting hints that this sort of creativity might also be perfect for the world of AI. Take an algorithm that plays the blues and combine it with the music of Boulez and you will end up with a strange hybrid composition that might just create a new sound world. Of course, it could also be a dismal cacophony. The coder needs to find two genres that can be fused algorithmically in an interesting way.
It is Boden’s third form of creativity that is the more mysterious and elusive, and that is transformational creativity. This describes those rare moments that are complete game changers. Every art form has these gear shifts. Think of Picasso and Cubism, Schoenberg and atonality, Joyce and modernism. They are like phase changes, when water suddenly goes from a liquid to a gas. This was the image Goethe hit on when he sought to describe wrestling for two years with how to write The Sorrows of Young Werther, only for a chance event to act as a sudden catalyst: ‘At that instant, the plan of Werther was found; the whole shot together from all directions, and became a solid mass, as the water in a vase, which is just at the freezing point, is changed by the slightest concussion into ice.’
Quite often these transformational moments hinge on changing the rules of the game, or dropping an assumption that previous generations had been working under. The square of a number is always positive. All molecules come in long lines not chains. Music must be written inside a harmonic scale structure. Faces have eyes on either side of the nose. At first glance it would seem hard to program such a decisive break, and yet there is a meta-rule for this type of creativity. You start by dropping constraints and see what emerges. The art, the creative act, is to choose what to drop or what fresh constraint to introduce such that you end up with a new thing of value.
If I were asked to identify a transformational moment in mathematics, the creation of the square root of minus one in the mid-sixteenth century would be a good candidate. This was a number that many mathematicians believed did not exist. It was referred to as an imaginary number (a derogatory term Descartes came up with to indicate that of course there was no such thing). And yet its creation did not contradict previous mathematics. It turned out it had been our mistake to exclude it. How can a computer come up with the concept of the square root of minus one when the data it is fed will tell it that there is no number whose square can be negative? A truly creative act sometimes requires us to step outside the system and create a new reality. Can a complex algorithm do that?
The emergence of the Romantic movement in music is in many ways a catalogue of rule breaking. Instead of moving between close key signatures as Classical composers had done, new upstarts like Schubert chose to shift key in ways that deliberately broke expectations. Schumann left chords unresolved that Haydn or Mozart would have felt the need to complete. Chopin in turn composed dense moments of chromatic runs and challenged rhythmic expectations with his unusual accented passages and bending of tempos. The move from one musical movement to another: from Medieval to Baroque to Classical to Romantic to Impressionist to Expressionist and beyond is a story of breaking the rules. Each movement is dependent on the one before to appreciate its creativity. It almost goes without saying that historical context plays an important role in allowing us to define something as new. Creativity is not an absolute but a relative activity. We are creative within our culture and frame of reference.
Can a computer initiate this kind of phase change and move us into a new musical or mathematical state? That seems a challenge. Algorithms learn how to act based on the data they interact with. Won’t this mean that they will always be condemned to producing more of the same?
As Picasso once said: ‘The chief enemy of creativity is good sense.’ That sounds on the face of it very much against the spirit of the machine. And yet you can program a system to behave irrationally. You can create a meta-rule that will instruct it to change course. As we shall see, this is in fact something machine learning is quite good at.
Can creativity be taught?
Many artists like to fuel their own creation myth, appealing to external forces as responsible for their creativity. In Ancient Greece poets were said to be possessed by the muses, who breathed inspiration into the minds of men, sometimes sending them insane in the process. For Plato ‘a poet is holy, and never able to compose until he has become inspired, and is beside himself and reason is no longer in him … for no art does he utter but by power divine’. Ramanujan, the great Indian mathematician, likewise attributed his great insights to ideas he received in his dreams from his family goddess Namagiri. Is creativity a form of madness or a gift of the divine?
One of my mathematical heroes, Carl Friedrich Gauss, was one of the worst at covering his creative tracks. Gauss is credited with creating modern number theory with the publication in 1798 of one of the great mathematical works of all time: Disquisitiones arithmeticae. When people tried to read the book to uncover where he got his ideas, they were mystified. The work has been described as a book of seven seals. Gauss seems to pull ideas like rabbits out of a hat, without ever really giving us an inkling of how he achieved this magic. Later, when challenged, he retorted that an architect does not leave up the scaffolding after the house is complete. Gauss, like Ramanujan, attributed one revelation to ‘the Grace of God’, saying he was ‘unable to name the nature of the thread which connected what I previously knew with that which made my success possible’.
Yet the fact that an artist may be unable to articulate where their ideas came from does not mean that they followed no rules. Art is a conscious expression of the myriad of logical gates that make up our unconscious thought processes. There was of course a thread of logic that connected Gauss’s thoughts: it was just hard for him to articulate what he was up to – or perhaps he wanted to preserve the mystery, to fuel his image as a creative genius. Coleridge’s claim that the drug-induced vision of Kubla Khan came to him in its entirety belies all the preparatory material that shows the poet working on the ideas before that fateful day when he was interrupted by the person from Porlock. Of course, this makes for a good story. Even my own account of creation will focus on the flash of inspiration rather than the years of preparatory work I put in.
We have an awful habit of romanticising creative genius. The solitary artist working in isolation is frankly a myth. In most instances what looks like a step change is actually a continuous growth. Brian Eno talks about the idea of ‘scenius’, not genius, to acknowledge the community out of which creative intelligence often emerges. The American writer Joyce Carol Oates agrees: ‘Creative work, like scientific work, should be greeted as a communal effort – an attempt by an individual to give voice to many voices, an attempt to synthesize and explore and analyze.’
What does it take to stimulate creativity? Might it be possible to program it into a machine? Are there rules we can follow to become creative? Can creativity, in other words, be a learned skill? Some would say that to teach or program is to show people how to imitate what has gone before, and that imitation and rule following are both incompatible with creativity. And yet we have examples of creative individuals all around us who have studied and learned and improved their skills. If we study what they do, could we imitate them and ultimately become creative ourselves?
These are questions I find myself asking every new semester. To receive their PhDs, doctoral candidates in mathematics have to create a new mathematical construct. They have to come up with something that has never been done before. I am tasked with teaching them how to do that. Of course, at some level they have been training to do this to a certain extent already. Solving problems involves personal creativity even if the answer is already known.
That training is an absolute prerequisite for the jump into the unknown. By rehearsing how others have come to their breakthroughs you hope to provide the environment to foster your own creativity. And yet that jump is far from guaranteed. I can’t take anyone off the street and teach them to be a creative mathematician. Maybe with ten years of training we could get there, but not every brain seems to be able to achieve mathematical creativity. Some people appear to be able to achieve creativity in one field but not another, yet it is difficult to understand what makes one brain a chess champion and another a Nobel Prize-winning novelist.
Margaret Boden recognises that creativity isn’t just about being Shakespeare or Einstein. She distinguishes between what she calls ‘psychological creativity’ and ‘historical creativity’. Many of us achieve acts of personal creativity that may be novel to us but historically old news. These are what Boden calls moments of psychological creativity. It is by repeated acts of personal creativity that ultimately one hopes to produce something that is recognised by others as new and of value. While historical creativity is rare, it emerges from encouraging psychological creativity.
My recipe for eliciting creativity in students follows the three modes of creativity Boden identified. Exploration is perhaps the most obvious path. First understand how we’ve come to the place we are now and then try to push the boundaries just a little bit further. This involves deep immersion in what we have created to date. Out of that deep understanding might emerge something never seen before. It is often important to impress on students that there isn’t very often some big bang that resounds with the act of creation. It is gradual. As Van Gogh wrote: ‘Great things are not done by impulse but by small things brought together.’
Boden’s second strategy, combinational creativity, is a powerful weapon, I find, in stimulating new ideas. I often encourage students to attend seminars and read papers in subjects that don’t appear to connect with the problem they are tackling. A line of thought from a disparate bit of the mathematical universe might resonate with the problem at hand and stimulate a new idea. Some of the most creative bits of science are happening today at the junctions between the disciplines. The more we can come out of our silos and share our ideas and problems, the more creative we are likely to be. This is where a lot of the low-hanging fruit is to be found.
At first sight transformational creativity seems hard to harness as a strategy. But again the goal is to test the status quo by dropping some of the constraints that have been put in place. Try seeing what happens if we change one of the basic rules we have accepted as part of the fabric of our subject. These are dangerous moments because you can collapse the system, but this brings me to one of the most important ingredients needed to foster creativity – and that is embracing failure.
Unless you are prepared to fail, you will not take the risks that will allow you to break out and create something new. This is why our education system and our business environment, both realms that abhor failure, are often terrible environments for fostering creativity. It is important to celebrate the failures as much as the successes in my students. Sure, the failures won’t make it into the PhD thesis, but we learn so much from failure. When I meet with my students I repeat again and again Beckett’s call to ‘Fail, fail again, fail better.’
Are these strategies that can be written into code? In the past the top-down approach to coding meant there was little prospect of creativity in the output of the code. Coders were never too surprised by what their algorithms produced. There was no room for experimentation or failure. But this all changed recently: because an algorithm, built on code that learns from its failures, did something that was new, shocked its creators, and had incredible value. This algorithm won a game that many believed was beyond the abilities of a machine to master. It was a game that required creativity to play.
It was news of this breakthrough that triggered my recent existential crisis as a mathematician.
3 (#ulink_0f983e9e-093c-5e0e-9daa-54bed17e32eb)
READY, STEADY, GO (#ulink_0f983e9e-093c-5e0e-9daa-54bed17e32eb)
We construct and construct, but intuition is still a good thing.
Paul Klee
People often compare mathematics to playing chess. There certainly are connections, but when Deep Blue beat the best chessmaster the human race could offer in 1997, it did not lead to the closure of mathematics departments. Although chess is a good analogy for the formal quality of constructing a proof, there is another game that mathematicians have regarded as much closer to the creative and intuitive side of being a mathematician, and that is the Chinese game of Go.
I first discovered Go when I visited the mathematics department at Cambridge as an undergraduate to explore whether to do my PhD with the amazing group that had helped complete the classification of finite simple groups, a sort of Periodic Table of Symmetry. As I sat talking to John Conway and Simon Norton, two of the architects of this great project, about the future of mathematics, I kept being distracted by students at the next table furiously slamming black and white stones onto a large 19×19 grid carved into a wooden board.
Eventually I asked Conway what they were doing. ‘That’s Go. It’s the oldest game that is still being played to this day.’ In contrast to the war-like quality of chess, he explained, Go was a game of territory. Players take it in turn to place white and black pieces or stones onto the 19×19 grid. If you manage to surround a collection of your opponent’s stones with your own, you capture your opponent’s stones. The winner is the player who has captured the most stones by the end of the game. It sounded rather simple. The subtlety of the game, Conway explained, is that as you try to surround your opponent, you must avoid having your own stones captured.
‘It’s a bit like mathematics: simple rules that give rise to beautiful complexity.’ It was while watching the game evolve between two experts as they drank coffee in the common room that Conway discovered that the endgame was behaving like a new sort of number that he christened ‘surreal numbers’.
I’ve always been fascinated by games. Whenever I travel abroad I like to learn and bring back the game locals like to play. So when I got back from the wild outreaches of Cambridge to the safety of my home in Oxford I decided to buy Go from the local toy shop to see what it was that was obsessing these students. As I began to explore the game with one of my fellow students in Oxford, I realised how subtle it was. It was hard to identify a clear strategy that would help me win. And as more stones were laid down on the board, the game seemed to get more complicated, unlike chess, where as pieces are gradually removed the game starts to simplify.
The American Go Association estimates that it would take a number with 300 digits to count the number of games of Go that are legally possible. In chess the computer scientist Claude Shannon estimated that a number with 120 digits (now called the Shannon number) would suffice. These are not small numbers in either case, but they give you a sense of the wide range of possible permutations.
I had played a lot of chess as a kid. I enjoyed working through the logical consequences of a proposed move. It appealed to the mathematician that was growing inside me. The tree of possibilities in chess branches in a controlled manner, making it manageable for a computer and even a human to analyse the implications of going down different branches. In contrast Go just doesn’t seem like a game that would allow you to work out the logical implications of a future move. Navigating the tree of possibilities quickly becomes impossible. That’s not to say that a Go player doesn’t follow through the logical consequences of their next move, but this seems to be combined with a more intuitive feel for the pattern of play.
The human brain is acutely attuned to finding structure and pattern if there is one in a visual image. A Go player can look at the lie of the stones and tap into the brain’s ability to pick out these patterns and exploit them in planning the next move. Computers have traditionally always struggled with vision. It is one of the big hurdles that engineers have wrestled with for decades.
The human brain’s highly developed sense of visual structure has been honed over millions of years and has been key to our survival. Any animal’s ability to survive depends in part on its ability to pick out structure in the visual mess that Nature confronts us with. A pattern in the chaos of the jungle is likely to be evidence of the presence of another animal – and you’d better take notice cos that animal might eat you (or maybe you could eat it). The human code is extremely good at reading patterns, interpreting how they might develop, and responding appropriately. It is one of our key assets, and it plays into our appreciation for the patterns in music and art.
It turns out that pattern recognition is precisely what I do as a mathematician when I venture into the unexplored reaches of the mathematical jungle. I can’t rely on a simple step-by-step logical analysis of the local environment. That won’t get me very far. It has to be combined with an intuitive feel for what might be out there. That intuition is built up by time spent exploring the known space. But it is often hard to articulate logically why you might believe that there is interesting territory out there to explore. A conjecture in mathematics is by its nature not yet proved, but the mathematician who has made the conjecture has built up a feeling that the mathematical statement they have made may have some truth to it. Observation and intuition go hand in hand as we navigate the thickets and seek to carve out a new path.
A mathematician who can make a good conjecture will often garner more respect than one who joins up the logical dots to reveal the truth of the conjecture. In the game of Go the final winning position is in some respects the conjecture and the plays are the logical moves on your way to proving that conjecture. But it is devilishly hard to spot the patterns along the way.
And so, although chess has been useful to help explain some aspects of mathematics, the game of Go has always been held up as far closer in spirit to the way mathematicians actually go about their business. That’s why mathematicians weren’t too worried when Deep Blue beat the best humans could offer at chess. The real challenge was the game of Go. For decades people have been claiming that the game of Go can never be played by a computer. Like all good absolutes, it invited creative coders to test that proposition. But even a junior player appeared to be able to outplay even the most complex algorithms. And so mathematicians happily hid behind the cover that Go was providing them. If a computer couldn’t play Go then there was no chance it could play the even subtler and more ancient game of mathematics.
But just as the Great Wall of China was eventually breached, my defensive wall has just crumbled in spectacular fashion.
Game Boy extraordinaire
At the beginning of 2016 it was announced that a program had been created to play Go that its developers were confident could hold its own against the best humans had to offer. Go players around the world were extremely sceptical, given the failure of past efforts. So the company that developed the program offered a challenge. It set up a public contest with a huge prize and invited one of the world’s leading Go players to take up the challenge. An international champion, Lee Sedol from Korea, stepped forward. The competition would be played over five games with the winner taking home a prize of one million dollars. The name of Sedol’s challenger: AlphaGo.
AlphaGo is the brainchild of Demis Hassabis. Hassabis was born in London in 1976 to a Greek Cypriot father and a mother from Singapore. Both parents are teachers and what Hassabis describes as bohemian technophobes. His sister and brother went the creative route, one becoming a composer, the other choosing creative writing. So Hassabis isn’t quite sure where his geeky scientific side came from. But as a kid Hassabis was someone who quickly marked himself out as gifted and talented, especially when it came to playing games. His abilities at chess were such that at eleven he was the second-highest-ranked child of his age in the world.
But then at an international match in Liechtenstein that year Hassabis had an epiphany: what on earth were they all doing? The hall was full of so many great minds exploring the logical intricacies of this great game. And yet Hassabis suddenly recognised the total futility of such a project. In a radio interview on the BBC he admitted thinking at the time: ‘We were wasting our minds. What if we used that brain power for something more useful like solving cancer?’
His parents were pretty shocked when after the tournament (which he narrowly lost after battling for ten hours with the adult Dutch world champion) he announced that he was giving up chess competitions. Everyone had thought this was going to be his life. But those years playing chess weren’t wasted. A few years earlier he’d used the £200 prize money he’d won for beating a US opponent, Alex Chang, to buy his first computer: a ZX Spectrum. That computer sparked his obsession with getting machines to do the thinking for him.
Hassabis soon graduated on to a Commodore Amiga, which could be programmed to play the games he enjoyed. Chess was still too complicated, but he managed to program the Commodore to play Othello, a game that looks rather similar to Go with black and white stones that get flipped when they are trapped between stones of the opposite colour. It’s not a game that merits grandmasters, so he tried his program out on his younger brother. It beat him every time.
This was classic ‘if …, then …’ programming: he needed to code in by hand the response to each of his opponent’s moves. It was: ‘If your opponent plays that move, then reply with this move.’ The creativity all came from Hassabis and his ability to see what the right responses were to win the game. It still felt a bit like magic though. Code up the right spell and then, rather like the Sorcerer’s Apprentice, the Commodore would go through the work of winning the game.
Hassabis raced through school, culminating with an offer from Cambridge to study computer science at the age of sixteen. He’d set his heart on Cambridge after seeing Jeff Goldblum in the film The Race for the Double Helix. ‘I thought, is this what goes on at Cambridge? You go there and you invent DNA in the pub? Wow.’
Cambridge wouldn’t let him start his degree at the age of sixteen, so he had to defer for a year. To fill his time he won a place working for a game developer after having come second in a competition run by Amiga Power magazine. While he was there, he created his own game, Theme Park, where players had to build and run their own theme park. The game was hugely successful, selling several million copies and winning a Golden Joystick award. With enough funds to finance his time at university, Hassabis set off for Cambridge.
His course introduced him to the greats of the AI revolution: Alan Turing and his test for intelligence, Arthur Samuel and his program to play draughts, John McCarthy, who coined the term artificial intelligence, Frank Rosenblatt and his first experiments with neural networks. These were the shoulders on which Hassabis aspired to stand. It was while sitting in his lectures at Cambridge that he heard his professor repeating the mantra that a computer could never play Go because of the game’s creative and intuitive characteristics. This was like a red rag to the young Hassabis. He left Cambridge determined to prove his professor wrong.
His idea was that rather than trying to write a program himself that could play Go, he would write a meta-program that would be responsible for writing the program that would play Go. It sounded a crazy idea, but the point was that the meta-program would be created so that as the Go-playing program played more and more games it would learn from its mistakes.
Hassabis had learned about a similar idea implemented by the artificial-intelligence researcher Donald Michie in the 1960s. Michie had written an algorithm called ‘MENACE’ that learned from scratch the best strategy to play noughts and crosses. (MENACE stood for Machine Educable Noughts And Crosses Engine.) To demonstrate the algorithm, Michie had rigged up 304 matchboxes representing all the possible layouts of noughts and crosses encountered while playing. Each matchbox was filled with different-coloured balls to represent possible moves. Balls were removed or added to the boxes to punish losses or reward wins. As the algorithm played more and more games, the reassignment of the balls eventually led to an almost perfect strategy for playing. It was this idea of learning from your mistakes that Hassabis wanted to use to train an algorithm to play Go.
Hassabis had a good model to base his strategy on. A newborn baby does not have a brain that is pre-programmed to cope with making its way through life. It is programmed instead to learn as it interacts with its environment.
If Hassabis was going to tap into the way the brain learned to solve problems, then knowing how the brain works was clearly going to help in his dream of creating a program to play Go. So he decided to do a PhD in neuroscience at University College London. It was during coffee breaks from lab work that Hassabis started discussing with a neuroscientist, Shane Legg, his plans to create a company to try out his ideas. It shows the low status of AI even a decade ago that they never admitted to their professors their dream to dedicate their lives to AI. But they felt they were on to something big, so in September 2010 the two scientists decided to create a company with Mustafa Suleyman, a friend of Hassabis from childhood. DeepMind was incorporated.
The company needed money but initially Hassabis just couldn’t raise any capital. Pitching on a platform that they were going to play games and solve intelligence did not sound serious to most investors. A few, however, did see the vision. Among those who put money in right at the outset were Elon Musk and Peter Thiel. Thiel had never invested outside Silicon Valley and tried to persuade Hassabis to relocate to the West Coast. A born-and-bred Londoner, Hassabis held his ground, insisting that there was more untapped talent in London that could be exploited. Hassabis remembers a crazy conversation he had with Thiel’s lawyer. ‘Does London have law on IP?’ she asked innocently. ‘I think they thought we were coming from Timbuctoo!’ The founders had to give up a huge amount of stock to the investors, but they had their money to start trying to crack AI.
The challenge of creating a machine that could learn to play Go still felt like a distant dream. They set their sights at first on a seemingly less cerebral goal: playing 1980s Atari games. Atari is probably responsible for a lot of students flunking courses in the late 1970s and early 1980s. I certainly remember wasting a huge amount of time playing the likes of Pong, Space Invaders and Asteroids on a friend’s Atari 2600 console. The console was one of the first whose hardware could play multiple games that were loaded via a cartridge. It allowed a whole range of different games to be developed over time. Previous consoles could only play games that had been physically programmed into the units.
One of my favourite Atari games was called Breakout. A wall of coloured bricks was at the top of the screen and you controlled a paddle at the bottom that could be moved left or right using a joystick. A ball would bounce off the paddle and head towards the bricks. Each time it hit a brick, the brick would disappear. The aim was to clear the bricks. The yellow bricks at the bottom of the wall scored one point. The red bricks on top got you seven points. As you cleared blocks, the paddle would shrink and the ball would speed up to make the game play harder.
We were particularly pleased one afternoon when we found a clever way to hack the game. If you dug a tunnel up through the bricks on the edge of the screen, once the ball made it through to the top it bounced back and forward off the top of the screen and the upper high-scoring bricks, gradually clearing the wall. You could sit back and watch until the ball eventually came back down through the wall. You just had to be ready with the paddle to bat the ball back up again. It was a very satisfying strategy!
Hassabis and the team he was assembling also spent a lot of time playing computer games in their youth. Their parents may be happy to know that the time and effort they put into those games did not go to waste. It turned out that Breakout was a perfect test case to see if the team at DeepMind could program a computer to learn how to play games. It would have been a relatively straightforward job to write a program for each individual game. Hassabis and his team were going to set themselves a much greater challenge.
They wanted to write a program that would receive as an input the state of the pixels on the screen and the current score and set it to play with the goal of maximising the score. The program was not told the rules of the game: it had to experiment randomly with different ways of moving the paddle in Breakout or firing the laser cannon at the descending aliens of Space Invaders. Each time it made a move it could assess whether the move had helped increase the score or had had no effect.
The code implements an idea dating from the 1990s called reinforcement learning, which aims to update the probability of actions based on the effect on a reward function or score. For example, in Breakout the only decision is whether to move the paddle at the bottom left or right. Initially the choice will be 50:50. But if moving the paddle randomly results in it hitting the ball, then a short time later the score goes up. The code then recalibrates the probability of whether to go left or right based on this new information. It will increase the chance of heading in the direction towards which the ball is heading. The new feature was to combine this learning with neural networks that would assess the state of the pixels to decide what features were correlating to the increase in score.
At the outset, because the computer was just trying random moves, it was terrible, hardly scoring anything. But each time it made a random move that bumped up the score, it would remember the move and reinforce the use of such a move in future. Gradually the random moves disappeared and a more informed set of moves began to emerge, moves that the program had learned through experiment seemed to boost its score.
It’s worth watching the supplementary video the DeepMind team included in the paper they eventually wrote. It shows the program learning to play Breakout. At first you see it randomly moving the paddle back and forward to see what will happen. Then, when the ball finally hits the paddle and bounces back and hits a brick and the score goes up, the program starts to rewrite itself. If the pixels of the ball and the pixels of the paddle connect that seems to be a good thing. After 400 game plays it’s doing really well, getting the paddle to continually bat the ball back and forward.
The shock for me came when you see what it discovered after 600 games. It found our hack! I’m not sure how many games it took us as kids to find this trick, but judging by the amount of time I wasted with my friend it could well have been more. But there it is. The program manipulated the paddle to tunnel its way up the sides, such that the ball would be stuck in the gap between the top of the wall and the top of the screen. At this point the score goes up very fast without the computer’s having to do very much. I remember my friend and I high-fiving when we’d discovered this trick. The machine felt nothing.
By 2014, four years after the creation of DeepMind, the program had learned how to outperform humans on twenty-nine of the forty-nine Atari games it had been exposed to. The paper the team submitted to Nature detailing their achievement was published in early 2015. To be published in Nature is one of the highlights of a scientist’s career. But their paper achieved the even greater accolade of being featured as the cover story of the whole issue. The journal recognised that this was a huge moment for artificial intelligence.
It has to be reiterated what an amazing feat of programming this was. From just the raw data of the state of the pixels and the changing score, the program had changed itself from randomly moving the paddle of Breakout back and forth to learning that tunnelling the sides of the wall would win you the top score. But Atari games are hardly on a par with the ancient game of Go. Hassabis and his team at DeepMind decided they were ready to create a new program that could take it on.
It was at this moment that Hassabis decided to sell the company to Google. ‘We weren’t planning to, but three years in, focused on fundraising, I had only ten per cent of my time for research,’ he explained in an interview in Wired at the time. ‘I realised that there’s maybe not enough time in one lifetime to both build a Google-sized company and solve AI. Would I be happier looking back on building a multi-billion business or helping solve intelligence? It was an easy choice.’ The sale put Google’s firepower at his fingertips and provided the space for him to create code to realise his goal of solving Go … and then intelligence.
First blood
Previous computer programs built to play Go had not come close to playing competitively against even a pretty good amateur, so most pundits were highly sceptical of DeepMind’s dream to create code that could get anywhere near an international champion of the game. Most people still agreed with the view expressed in The New York Times by the astrophysicist Piet Hut after DeepBlue’s success at chess in 1997: ‘It may be a hundred years before a computer beats humans at Go – maybe even longer. If a reasonably intelligent person learned to play Go, in a few months he could beat all existing computer programs. You don’t have to be a Kasparov.’
Just two decades into that hundred years, the DeepMind team believed they might have cracked the code. Their strategy of getting algorithms to learn and adapt appeared to be working, but they were unsure quite how powerful the emerging algorithm really was. So in October 2015 they decided to test-run their program in a secret competition against the current European champion, the Chinese-born Fan Hui.
AlphaGo destroyed Fan Hui five games to nil. But the gulf between European players of the game and those in the Far East is huge. The top European players, when put in a global league, rank in the 600s. So, although it was still an impressive achievement, it was like building a driverless car that could beat a human driving a Ford Fiesta round Silverstone then trying to challenge Lewis Hamilton in a Grand Prix.
Certainly when the press in the Far East heard about Fan Hui’s defeat they were merciless in their dismissal of how meaningless the win was for AlphaGo. Indeed, when Fan Hui’s wife contacted him in London after the news got out, she begged her husband not to go online. Needless to say he couldn’t resist. It was not a pleasant experience to read how dismissive the commentators in his home country were of his credentials to challenge AlphaGo.
Fan Hui credits his matches with AlphaGo with teaching him new insights into how to play the game. In the following months his ranking went from 633 to the 300s. But it wasn’t only Fan Hui who was learning. Every game AlphaGo plays affects its code and changes it to improve its play next time around.
It was at this point that the DeepMind team felt confident enough to offer their challenge to Lee Sedol, South Korea’s eighteen-time world champion and a formidable player of the game.
The match was to be played over five games scheduled between 9 and 15 March 2016 at the Four Seasons hotel in Seoul, and would be broadcast live across the internet. The winner would receive a prize of a million dollars. Although the venue was public, the precise location within the hotel was kept secret and was isolated from noise – not that AlphaGo was going to be disturbed by the chitchat of the press and the whispers of curious bystanders. It would assume a perfect Zen-like state of concentration wherever it was placed.
Sedol wasn’t fazed by the news that he was up against a machine that had beaten Fan Hui. Following Fan Hui’s loss he had declared: ‘Based on its level seen … I think I will win the game by a near landslide.’
Although he was aware of the fact that the machine he would be playing was learning and evolving, this did not concern him. But as the match approached, you could hear doubts beginning to creep into his view of whether AI will ultimately be too powerful for humans to defeat it even in the game of Go. In February he stated: ‘I have heard that DeepMind’s AI is surprisingly strong and getting stronger, but I am confident that I can win … at least this time.’
Most people still felt that despite great inroads into programming, an AI Go champion was still a distant goal. Rémi Coulom, the creator of Crazy Stone, the only program to get close to playing Go at any high standard, was still predicting another decade before computers would beat the best humans at the game.
As the date for the match approached, the team at DeepMind felt they needed someone to really stretch AlphaGo and to test it for any weaknesses. So they invited Fan Hui back to play the machine going into the last few weeks. Despite having suffered a 5–0 defeat and being humiliated by the press back in China, he was keen to help out. Perhaps a bit of him felt that if he could help make AlphaGo good enough to beat Sedol, it would make his defeat less humiliating.
As Fan Hui played he could see that AlphaGo was extremely strong in some areas but he managed to reveal a weakness that the team was not aware of. There were certain configurations in which it seemed to completely fail to assess who had control of the game, often becoming totally delusional that it was winning when the opposite was true. If Sedol tapped into this weakness, AlphaGo wouldn’t just lose, it would appear extremely stupid.
The DeepMind team worked around the clock trying to fix this blind spot. Eventually they just had to lock down the code as it was. It was time to ship the laptop they were using to Seoul.
The stage was set for a fascinating duel as the players, or at least one player, sat down on 9 March to play the first of the five games.
‘Beautiful. Beautiful. Beautiful’
It was with a sense of existential anxiety that I fired up the YouTube channel broadcasting the matches that Sedol would play against AlphaGo and joined 280 million other viewers to see humanity take on the machines. Having for years compared creating mathematics to playing the game of Go, I had a lot on the line.
Lee Sedol picked up a black stone and placed it on the board and then waited for the response. Aja Huang, a member of the DeepMind team, would play the physical moves for AlphaGo. This, after all, was not a test of robotics but of artificial intelligence. Huang stared at AlphaGo’s screen, waiting for its response to Sedol’s first stone. But nothing came.
We all stared at our screens wondering if the program had crashed! The DeepMind team was also beginning to wonder what was up. The opening moves are generally something of a formality. No human would think so long over move 2. After all, there was nothing really to go on yet. What was happening? And then a white stone appeared on the computer screen. It had made its move. The DeepMind team breathed a huge sigh of relief. We were off! Over the next couple of hours the stones began to build up across the board.
One of the problems I had as I watched the game was assessing who was winning at any given point in the game. It turns out that this isn’t just because I’m not a very experienced Go player. It is a characteristic of the game. Indeed, this is one of the main reasons why programming a computer to play Go is so hard. There isn’t an easy way to turn the current state of the game into a robust scoring system of who leads by how much.
Chess, by contrast, is much easier to score as you play. Each piece has a different numerical value which gives you a simple first approximation of who is winning. Chess is destructive. One by one pieces are removed so the state of the board simplifies as the game proceeds. But Go increases in complexity as you play. It is constructive. The commentators kept up a steady stream of observations but struggled to say if anyone was in the lead right up until the final moments of the game.
What they were able to pick up quite quickly was Sedol’s opening strategy. If AlphaGo had learned to play on games that had been played in the past, then Sedol was working on the principle that it would put him at an advantage if he disrupted the expectations it had built up by playing moves that were not in the conventional repertoire. The trouble was that this required Sedol to play an unconventional game – one that was not his own.
It was a good idea but it didn’t work. Any conventional machine programmed on a database of accepted openings wouldn’t have known how to respond and would most likely have made a move that would have serious consequences in the grand arc of the game. But AlphaGo was not a conventional machine. It could assess the new moves and determine a good response based on what it had learned over the course of its many games. As David Silver, the lead programmer on AlphaGo, explained in the lead-up to the match: ‘AlphaGo learned to discover new strategies for itself, by playing millions of games between its neural networks, against themselves, and gradually improving.’ If anything, Sedol had put himself at a disadvantage by playing a game that was not his own.
As I watched I couldn’t help feeling for Sedol. You could see his confidence draining out of him as it gradually dawned on him that he was losing. He kept looking over at Huang, the DeepMind representative who was playing AlphaGo’s moves, but there was nothing he could glean from Huang’s face. By move 186 Sedol had to recognise that there was no way to overturn the advantage AlphaGo had built up on the board. He placed a stone on the side of the board to indicate his resignation.
By the end of day one it was: AlphaGo 1 Humans 0. Sedol admitted at the press conference that day: ‘I was very surprised because I didn’t think I would lose.’
But it was game 2 that was going to truly shock not just Sedol but every human player of the game of Go. The first game was one that experts could follow and appreciate why AlphaGo was playing the moves it was. They were moves a human champion would play. But as I watched game 2 on my laptop at home, something rather strange happened. Sedol played move 36 and then retired to the roof of the hotel for a cigarette break. While he was away, AlphaGo on move 37 instructed Huang, its human representative, to place a black stone on the line five steps in from the edge of the board. Everyone was shocked.
The conventional wisdom is that during the early part of the game you play stones on the outer four lines. The third line builds up short-term territory strength on the edge of the board while playing on the fourth line contributes to your strength later in the game as you move into the centre of the board. Players have always found that there is a fine balance between playing on the third and fourth lines. Playing on the fifth line has always been regarded as suboptimal, giving your opponent the chance to build up territory that has both short- and long-term influence.
AlphaGo had broken this orthodoxy built up over centuries of competing. Some commentators declared it a clear mistake. Others were more cautious. Everyone was intrigued to see what Sedol would make of the move when he returned from his cigarette break. As he sat down, you could see him physically flinch as he took in the new stone on the board. He was certainly as shocked as all of the rest of us by the move. He sat there thinking for over twelve minutes. Like chess, the game was being played under time constraints. Using twelve minutes of your time was very costly. It is a mark of how surprising this move was that it took Sedol so long to respond. He could not understand what AlphaGo was doing. Why had the program abandoned the region of stones they were competing over?
Was this a mistake by AlphaGo? Or did it see something deep inside the game that humans were missing? Fan Hui, who had been given the role of one of the referees, looked down on the board. His initial reaction matched everyone else’s: shock. And then he began to realise: ‘It’s not a human move. I’ve never seen a human play this move,’ he said. ‘So beautiful. Beautiful. Beautiful. Beautiful.’
Beautiful and deadly it turned out to be. Not a mistake but an extraordinarily insightful move. Some fifty moves later, as the black and white stones fought over territory from the lower left-hand corner of the board, they found themselves creeping towards the black stone of move 37. It was joining up with this stone that gave AlphaGo the edge, allowing it to clock up its second win. AlphaGo 2 Humans 0.
Sedol’s mood in the press conference that followed was notably different. ‘Yesterday I was surprised. But today I am speechless … I am in shock. I can admit that … the third game is not going to be easy for me.’ The match was being played over five games. This was the game that Sedol needed to win to be able to stop AlphaGo claiming the match.
The human fight-back
Sedol had a day off to recover. The third game would be played on Saturday, 12 March. He needed the rest, unlike the machine. The first game had been over three hours of intense concentration. The second lasted over four hours. You could see the emotional toll that losing two games in a row was having on him.
Rather than resting, though, Sedol stayed up till 6 a.m. the next morning analysing the games he’d lost so far with a group of fellow professional Go players. Did AlphaGo have a weakness they could exploit? The machine wasn’t the only one who could learn and evolve. Sedol felt he might learn something from his losses.
Sedol played a very strong opening to game 3, forcing AlphaGo to manage a weak group of stones within his sphere of influence on the board. Commentators began to get excited. Some said Sedol had found AlphaGo’s weakness. But then, as one commentator posted: ‘Things began to get scary. As I watched the game unfold and the realisation of what was happening dawned on me, I felt physically unwell.’
Sedol pushed AlphaGo to its limits but in so doing he revealed the hidden powers that the program seemed to possess. As the game proceeded, it started to make what commentators called lazy moves. It had analysed its position and was so confident in its win that it chose safe moves. It didn’t care if it won by half a point. All that mattered was that it won. To play such lazy moves was almost an affront to Sedol, but AlphaGo was not programmed with any vindictive qualities. Its sole goal was to win the game. Sedol pushed this way and that, determined not to give in too quickly. Perhaps one of these lazy moves was a mistake that he could exploit.
By move 176 Sedol eventually caved in and resigned. AlphaGo 3 Humans 0. AlphaGo had won the match. Backstage, the DeepMind team was going through a strange range of emotions. They’d won the match, but seeing the devastating effect it was having on Sedol made it hard for them to rejoice. The million-dollar prize was theirs. They’d already decided to donate the prize, if they won, to a range of charities dedicated to promoting Go and science subjects as well as to Unicef. Yet their human code was causing them to empathise with Sedol’s pain.
AlphaGo did not demonstrate any emotional response to its win. No little surge of electrical current. No code spat out with a resounding ‘YES!’ It is this lack of response that gives humanity hope and is also scary at the same time. Hope because it is this emotional response that is the drive to be creative and venture into the unknown: it was humans, after all, who’d programmed AlphaGo with the goal of winning. Scary because the machine won’t care if the goal turns out to be not quite what its programmers had intended.
Sedol was devastated. He came out in the press conference and apologised:
I don’t know how to start or what to say today, but I think I would have to express my apologies first. I should have shown a better result, a better outcome, and better content in terms of the game played, and I do apologize for not being able to satisfy a lot of people’s expectations. I kind of felt powerless.
But he urged people to keep watching the final two games. His goal now was to try to at least get one back for humanity.
Having lost the match, Sedol started game 4 playing far more freely. It was as if the heavy burden of expectation had been lifted, allowing him to enjoy his game. In sharp contrast to the careful, almost cautious play of game 3, he launched into a much more extreme strategy called ‘amashi’. One commentator compared it to a city investor who, rather than squirrelling away small gains that accumulate over time, bet the whole bank.
Sedol and his team had stayed up all of Saturday night trying to reverse-engineer from AlphaGo’s games how it played. It seemed to work on a principle of playing moves that incrementally increase its probability of winning rather than betting on the potential outcome of a complicated single move. Sedol had witnessed this when AlphaGo preferred lazy moves to win game 3. The strategy they’d come up with was to disrupt this sensible play by playing the risky single moves. An all-or-nothing strategy might make it harder for AlphaGo to score so easily.
AlphaGo seemed unfazed by this line of attack. Seventy moves into the game, commentators were already beginning to see that AlphaGo had once again gained the upper hand. This was confirmed by a set of conservative moves that were AlphaGo’s signal that it had the lead. Sedol had to come up with something special if he was going to regain the momentum.
If move 37 of game 2 was AlphaGo’s moment of creative genius, move 78 of game 4 was Sedol’s retort. He’d sat there for thirty minutes staring at the board, staring at defeat, when he suddenly placed a white stone in an unusual position, between two of AlphaGo’s black stones. Michael Redmond, who was commentating on the YouTube channel, spoke for everyone: ‘It took me by surprise. I’m sure that it would take most opponents by surprise. I think it took AlphaGo by surprise.’
It certainly seemed to. AlphaGo appeared to completely ignore the play, responding with a strange move. Within several more moves AlphaGo could see that it was losing. The DeepMind team stared at their screens behind the scenes and watched their creation imploding. It was as if move 78 short-circuited the program. It seemed to cause AlphaGo to go into meltdown as it made a whole sequence of destructive moves. This apparently is another characteristic of the way Go algorithms are programmed. Once they see that they are losing they go rather crazy.
Silver, the chief programmer, winced as he saw the next move AlphaGo was suggesting: ‘I think they’re going to laugh.’ Sure enough, the Korean commentators collapsed into fits of giggles at the moves AlphaGo was now making. Its moves were failing the Turing Test. No human with a shred of strategic sense would make them. The game dragged on for a total of 180 moves, at which point AlphaGo put up a message on the screen that it had resigned. The press room erupted with spontaneous applause.
The human race had got one back. AlphaGo 3 Humans 1. The smile on Lee Sedol’s face at the press conference that evening said it all. ‘This win is so valuable that I wouldn’t exchange it for anything in the world.’ The press cheered wildly. ‘It’s because of the cheers and the encouragement that you all have shown me.’
Gu Li, who was commentating on the game in China, declared Sedol’s move 78 as the ‘hand of god’. It was a move that broke the conventional way to play the game and that was ultimately the key to its shocking impact. Yet this is characteristic of true human creativity. It is a good example of Boden’s transformational creativity, whereby breaking out of the system you can find new insights.
At the press conference, Hassabis and Silver could not explain why AlphaGo had lost. They would need to go back and analyse why it had made such a lousy move in response to Sedol’s move 78. It turned out that AlphaGo’s experience in playing humans had led it to totally dismiss such a move as something not worth thinking about. It had assessed that this was a move that had only a one in 10,000 chance of being played. It seems as if it just had not bothered to learn a response to such a move because it had prioritised other moves as more likely and therefore more worthy of response.
Perhaps Sedol just needed to get to know his opponent. Perhaps over a longer match he would have turned the tables on AlphaGo. Could he maintain the momentum into the fifth and final game? Losing 3–2 would be very different from 4–1. The last game was still worth competing in. If he could win a second game, then it would sow seeds of doubt about whether AlphaGo could sustain its superiority.
But AlphaGo had learned something valuable from its loss. You play Sedol’s one in 10,000 move now against the algorithm and you won’t get away with it. That’s the power of this sort of algorithm. It learns from its mistakes.
That’s not to say it can’t make new mistakes. As game 5 proceeded, there was a moment quite early on when AlphaGo seemed to completely miss a standard set of moves in response to a particular configuration that was building. As Hassabis tweeted from backstage: ‘#AlphaGo made a bad mistake early in the game (it didn’t know a known tesuji) but now it is trying hard to claw it back … nail-biting.’
Sedol was in the lead at this stage. It was game on. Gradually AlphaGo did claw back. But right up to the end the DeepMind team was not exactly sure whether it was winning. Finally, on move 281 – after five hours of play – Sedol resigned. This time there were cheers backstage. Hassabis punched the air. Hugs and high fives were shared across the team. The win that Sedol had pulled off in game 4 had suddenly re-engaged their competitive spirit. It was important for them not to lose this last game.
Looking back at the match, many recognise what an extraordinary moment this was. Some immediately commented on its being an inflexion point for AI. Sure, all this machine could do was play a board game, and yet, for those looking on, its capability to learn and adapt was something quite new. Hassabis’s tweet after winning the first game summed up the achievement: ‘#AlphaGo WINS!!!! We landed it on the moon.’ It was a good comparison. Landing on the moon did not yield extraordinary new insights about the universe, but the technology that we developed to achieve such a feat has. Following the last game, AlphaGo was awarded an honorary professional 9 dan rank by the South Korean Go Association, the highest accolade for a Go player.
From hilltop to mountain peak
Move 37 of game 2 was a truly creative act. It was novel, certainly, it caused surprise, and as the game evolved it proved its value. This was exploratory creativity, pushing the limits of the game to the extreme.
One of the important points about the game of Go is that there is an objective way to test whether a novel move has value. Anyone can come up with a new move that appears creative. The art and challenge are in making a novel move that has some sort of value. How should we assess value? It can be very subjective and time-dependent. Something that is panned critically at the time of its release can be recognised generations later as a transformative creative act. Nineteenth-century audiences didn’t know what to make of Beethoven’s Symphony no. 5, and yet it is central repertoire now. During his lifetime Van Gogh could barely sell his paintings, trading them for food or painting materials, but now they go for millions. In Go there is a more tangible and immediate test of value: does it help you win the game? Move 37 won AlphaGo game 2. There was an objective measure that we could use to value the novelty of this move.
AlphaGo had taught the world a new way to play an ancient game. Analysis since the match has resulted in new tactics. The fifth line is now played early on, as we have come to understand that it can have big implications for the endgame. AlphaGo has gone on to discover still more innovative strategies. DeepMind revealed at the beginning of 2017 that its latest iteration had played online anonymously against a range of top-ranking professionals under two pseudonyms: Master and Magister. Human players were unaware that they were playing a machine. Over a few weeks it had played a total of sixty complete games. It won all sixty games.
But it was the analysis of the games that was truly insightful. Those games are now regarded as a treasure trove of new ideas. In several games AlphaGo played moves that beginners would have their wrists slapped for by their Go master. Traditionally you do not play a stone in the intersection of the third column and third row. And yet AlphaGo showed how to use such a move to your advantage.
Hassabis describes how the game of Go had got stuck on what mathematicians like to call a local maximum. If you look at the landscape I’ve drawn here (#ulink_7f8966d3-6bdf-5805-bd07-e485fc0cb2a8) then you might be at the top of peak A. From this height there is nowhere higher to go. This is called a local maximum. If there were fog all around you, you’d think you were at the highest point in the land. But across the valley is a higher peak. To know this, you need the fog to clear. You need to descend from your peak, cross the valley and climb the higher peak.
The trouble with modern Go is that conventions had built up about ways to play that had ensured players hit peak A. But by breaking those conventions AlphaGo had cleared the fog and revealed an even higher peak B. It’s even possible to measure the difference. In Go, a player using the conventions of peak A will in general lose by two stones to the player using the new strategies discovered by AlphaGo.
This rewriting of the conventions of how to play Go has happened at a number of previous points in history. The most recent was the innovative game play introduced by the legendary Go Seigen in the 1930s. His experimentation with ways of playing the opening moves revolutionised the way the game is played. But Go players now recognise that AlphaGo might well have launched an even greater revolution.
The Chinese Go champion Ke Jie recognises that we are in a new era: ‘Humanity has played Go for thousands of years, and yet, as AI has shown us, we have not yet even scratched the surface. The union of human and computer players will usher in a new era.’
Ke Jie’s compatriot Gu Li, winner of the most Go world titles, added: ‘Together, humans and AI will soon uncover the deeper mysteries of Go.’ Hassabis compares the algorithm to the Hubble telescope. This illustrates the way many view this new AI. It is a tool for exploring deeper, further, wider than ever before. It is not meant to replace human creativity but to augment it.
And yet there is something that I find quite depressing about this moment. It feels almost pointless to want to aspire to be the world champion at Go when you know there is a machine that you will never be able to beat. Professional Go players have tried to put a brave face on it, talking about the extra creativity that it has unleashed in their own play, but there is something quite soul-destroying about knowing that we are now second best to the machine. Sure, the machine was programmed by humans, but that doesn’t really seem to make it feel better.
AlphaGo has since retired from competitive play. The Go team at DeepMind has been disbanded. Hassabis proved his Cambridge lecturer wrong. DeepMind has now set its sights on other goals: health care, climate change, energy efficiency, speech recognition and generation, computer vision. It’s all getting very serious.
Given that Go was always my shield against computers doing mathematics, was my own subject next in DeepMind’s cross hairs? To truly judge the potential of this new AI we are going to need to look more closely at how it works and dig around inside. The crazy thing is that the tools DeepMind is using to create the programs that might put me out of a job are precisely the ones that mathematicians have created over the centuries. Is this mathematical Frankenstein’s monster about to turn on its creator?
4 (#ulink_b7db296c-191d-5acf-ba31-7d6811ec7763)
ALGORITHMS, THE SECRET TO MODERN LIFE (#ulink_b7db296c-191d-5acf-ba31-7d6811ec7763)
The Analytical Engine weaves algebraic patterns, just as the Jacquard loom weaves flowers and leaves.
Ada Lovelace
Our lives are completely run by algorithms. Every time we search for something on the internet, plan a journey with our GPS, choose a movie recommended by Netflix or pick a date online, we are being guided by an algorithm. Algorithms are steering us through the digital age, yet few people realise that they predate the computer by thousands of years and go to the heart of what mathematics is all about.
The birth of mathematics in Ancient Greece coincides with the development of one of the very first algorithms. In Euclid’s Elements, alongside the proof that there are infinitely many prime numbers, we find a recipe that, if followed step by step, solves the following problem: given two numbers, find the largest number that divides them both.
It may help to put the problem in more visual terms. Imagine that the floor of your kitchen is 36 feet long by 15 feet wide. You want to know the largest square tile that will enable you to cover the entire floor without cutting any tiles. So what should you do? Here is the 2000-year-old algorithm that solves the problem:
Suppose you have two numbers, M and N (and suppose N is smaller than M). Start by dividing M by N and call the remainder N
. If N
is zero, then N is the largest number that divides them both. If N
is not zero, then divide N by N
and call the remainder N
. If N
is zero, then N
is the largest number that divides M and N. If N
is not zero, then do the same thing again. Divide N
by N
and call the remainder N
. These remainders are getting smaller and smaller and are whole numbers, so at some point one must hit zero. When it does, the algorithm guarantees that the previous remainder is the largest number that divides both M and N. This number is known as the highest common factor or greatest common divisor.
Now let’s return to our challenge of tiling the kitchen floor. First we find the largest square tile that will fit inside the original shape. Then we look for the largest square tile that will fit inside the remaining part – and so on, until you hit a square tile that finally covers the remaining space evenly. This is the largest square tile that will enable you to cover the entire floor without cutting any tiles.
If M = 36 and N = 15, then dividing N into M gives you a remainder of N
= 6. Dividing N
into N we get a remainder of N
= 3. But now dividing N
by N
we get no remainder at all, so we know that 3 is the largest number that can divide both 36 and 15.
You see that there are lots of ‘if …, then …’ clauses in this process. That is typical of an algorithm and is what makes algorithms so perfect for coding and computers. Euclid’s ancient recipe gets to the heart of four key characteristics any algorithm should ideally possess:
1. It should consist of a precisely stated and unambiguous set of instructions.
2. The procedure should always finish, regardless of the numbers you insert. (It should not enter an infinite loop!)
3. It should give the answer for any values input into the algorithm.
4. Ideally it should be fast.
In the case of Euclid’s algorithm, there is no ambiguity at any stage. Because the remainder grows smaller at every step, after a finite number of steps it must hit zero, at which point the algorithm stops and spits out the answer. The bigger the numbers, the longer the algorithm will take, but it’s still proportionally fast. (The number of steps is five times the number of digits in the smaller of the two numbers, for those who are curious.)
If one of the oldest algorithms is 2000 years old then why does it owe its name to a ninth-century Persian mathematician? Muhammad Al-Khwarizmi was one of the first directors of the great House of Wisdom in Baghdad and was responsible for many of the translations of the Ancient Greek mathematical texts into Arabic. ‘Algorithm’ is the Latin interpretation of his name. Although all the instructions for Euclid’s algorithm are there in the Elements, the language that Euclid used was very clumsy. The Ancient Greeks thought very geometrically, so numbers were lengths of lines and proofs consisted of pictures – a bit like our example with tiling the kitchen floor. But pictures are not a rigorous way to do mathematics. For that you need the language of algebra, where a letter can stand for any number. This was the invention of Al-Khwarizmi.
To be able to articulate clearly how an algorithm works you need a language that allows you to talk about a number without specifying what that number is. We already saw it at work in explaining how Euclid’s algorithm worked. We gave names to the numbers that we were trying to analyse: N and M. These variables can represent any number. The power of this new linguistic take on mathematics meant that it allowed mathematicians to understand the grammar that underlies the way that numbers work. You didn’t have to talk about particular examples where the method worked. This new language of algebra provided a way to explain the patterns that lie behind the behaviour of numbers. A bit like a code for running a program, it shows why it would work whatever numbers you chose, the third criterion in our conditions for a good algorithm.
Algorithms have become the currency of our era because they are perfect fodder for computers. An algorithm exploits the pattern underlying the way we solve a problem to guide us to a solution. The computer doesn’t need to think. It just follows the instructions encoded in the algorithm again and again, and, as if by magic, out pops the answer you were looking for.
Desert island algorithm
One of the most extraordinary algorithms of the modern age is the one that helps millions of us navigate the internet every day. If I were cast away on a desert island and could only take one algorithm with me, I’d probably choose the one that drives Google. (Not that it would be much use, as I’d be unlikely to have an internet connection.)
In the early days of the internet (we’re talking the early 1990s) there was a directory that listed all of the existing websites. In 1994 there were only 3000 of them. The internet was small enough for you to pretty easily thumb through and find what you were looking for. Things have changed quite a bit since then. When I started writing this paragraph there were 1,267,084,131 websites live on the internet. A few sentences later that number has gone up to 1,267,085,440. (You can check the current status here: http://www.internetlivestats.com/ (http://www.internetlivestats.com/).)
How does Google figure out exactly which one of the billion websites to recommend? Mary Ashwood, an 86-year-old granny from Wigan, was careful to send her requests with a courteous ‘please’ and ‘thank you’, perhaps imagining an industrious group of interns on the other end sifting through the endless requests. When her grandson Ben opened her laptop and found ‘Please translate these roman numerals mcmxcviii thank you’, he couldn’t resist tweeting the world about his nan’s misconception. He got a shock when someone at Google replied with the following tweet:
Dearest Ben’s Nan.
Hope you’re well.
In a world of billions of Searches, yours made us smile.
Oh, and it’s 1998.
Thank YOU
Ben’s Nan brought out the human in Google on this occasion, but there is no way any company could respond personally to the million searches Google receives every fifteen seconds. So if it isn’t magic Google elves scouring the internet, how does Google succeed in so spectacularly locating the answers you want?
It all comes down to the power and beauty of the algorithm Larry Page and Sergey Brin cooked up in their dorm rooms at Stanford in 1996. They originally wanted to call their new algorithm ‘Backrub’, but eventually settled instead on ‘Google’, inspired by the mathematical number for one followed by 100 zeros, which is known as a googol. Their mission was to find a way to rank pages on the internet to help navigate this growing database, so a huge number seemed like a cool name.
It isn’t that there weren’t other algorithms out there being used to do the same thing, but these were pretty simple in their conception. If you wanted to find out more about ‘the polite granny and Google’, existing algorithms would have identified all of the pages with these words and listed them in order, putting the websites with the most occurrences of the search terms up at the top.
That’s OK but easily hackable: any florist who sticks into their webpage’s meta-data the words ‘Mother’s Day Flowers’ a thousand times will shoot to the top of every son or daughter’s search. You want a search engine that can’t easily be pushed around by savvy web designers. So how can you come up with an unbiased measure of the importance of a website? And how can you find out which sites you can ignore?
Page and Brin struck on the clever idea that if a website has many links pointing to it, then those other sites are signalling that it is worth visiting. The idea is to democratise the measure of a website’s worth by letting other websites vote for who they think is important. But, again, this could be hacked. I just need to set up a thousand artificial websites linking to my florist’s website and it will bump the site up the list. To prevent this, they decided to give more weight to a vote that came from a website that itself commanded respect.
This still left them with a challenge: how do you rank the importance of one site over another? Take this mini-network (#ulink_8f496f94-c00d-5761-86bf-92a3596bb346) as an example.
We want to start by giving each site equal weight. Let’s think of the websites as buckets; we’ll give each site eight balls to indicate that they have equal rank. Now the websites have to give their balls to the sites they link to. If they link to more than one site, then they will share their balls equally. Since website A links to both website B and website C, for example, it will give 4 balls to each site. Website B, however, has decided to link only to website C, putting all eight of its balls into website C’s bucket (#ulink_6c2c9da4-ef59-5545-8ffb-89c80e7409f2).
After the first distribution, website C comes out looking very strong. But we need to keep repeating the process because website A will be boosted by the fact that it is being linked to by the now high-ranking website C. The table below shows how the balls move around as we iterate the process.
At the moment, this does not seem to be a particularly good algorithm. It appears not to stabilise and is rather inefficient, failing two of our criteria for the ideal algorithm. Page and Brin’s great insight was to realise that they needed to find a way to assign the balls by looking at the connectivity of the network. It turned out they’d been taught a clever trick in their university mathematics course that worked out the correct distribution in one step.
The trick starts by constructing a matrix which records the way that the balls are redistributed among the websites. The first column of the matrix records the proportion going from website A to the other websites. In this case 0.5 goes to website B and 0.5 to website C. The matrix of redistribution is therefore given by the following matrix:
The challenge is to find what is called the eigenvector of this matrix with eigenvalue 1. This is a column vector that does not get changed when multiplied by the matrix.
Finding these eigenvectors or stability points is something we teach undergraduates early on in their university career. In the case of our network we find that the following column vector is stabilised by the redistribution matrix:
This means that if we split the balls in a 2:1:2 distribution we see that this weighting is stable. Distribute the balls using our previous game and the sites still have a 2:1:2 distribution.
Eigenvectors of matrices are an incredibly potent tool in mathematics and the sciences more generally. They are the secret to working out the energy levels of particles in quantum physics. They can tell you the stability of a rotating fluid like a spinning star or the reproduction rate of a virus. They may even be key to understanding how prime numbers are distributed throughout all numbers.
By calculating the eigenvector of the network’s connectivity we see that websites A and C should be ranked equally. Although website A is linked to by only one site (website C), the fact that website C is highly valued and links only to website A means that its link bestows high value to website A.
This is the basic core of the algorithm. There are a few extra subtleties that need to be introduced to get the algorithm working in its full glory. For example, the algorithm needs to take into account anomalies like websites that don’t link to any other websites and become sinks for the balls being redistributed. But at its heart is this simple idea.
Although the basic engine is very public, there are parameters inside the algorithm that are kept secret and change over time, and which make the algorithm a little harder to hack. But the fascinating thing is the robustness of the Google algorithm and its imperviousness to being gamed. It is very difficult for a website to do anything on its own site that will increase its rank. It must rely on others to boost its position. If you look at the websites that Google’s page rank algorithm scores highly, you will see a lot of major news sources and university websites like Oxford and Harvard. This is because many outside websites will link to findings and opinions on university websites, because the research we do is valued by many people across the world.
Interestingly this means that when anyone with a website within the Oxford network links to an external site, the link will cause a boost to the external website’s page rank, as Oxford is sharing a bit of its huge prestige (or cache of balls) with that website. This is why I often get requests to link from my website in the maths department at Oxford to external websites. The link will help increase the external website’s rank and, it is hoped, make it appear on the first page of a Google search, the ultimate holy grail for any website.
But the algorithm isn’t immune to clever attacks by those who understand how the mathematics works. For a short period in the summer of 2018, if you googled ‘idiot’ the first image that appeared was that of Donald Trump. Activists had understood how to exploit the powerful position that the website Reddit has on the internet. By getting people to vote for a post on the site containing the words ‘idiot’ and an image of Trump, the connection between the two shot to the top of the Google ranking. The spike was smoothed out over time by the algorithm rather than by manual intervention. Google does not like to play God but trusts in the long run in the power of its mathematics.
The internet is of course a dynamic beast, with new websites emerging every nanosecond and new links being made as existing sites are shut down or updated. This means that page ranks need to change dynamically. In order for Google to keep pace with the constant evolution of the internet, it must regularly trawl through the network and update its count of the links between sites using what it rather endearingly calls ‘Google spiders’.
Tech junkies and sports coaches have discovered that this way of evaluating the nodes in a network can also be applied to other networks. One of the most intriguing external applications has been in the realm of football (of the European kind, which Americans think of as soccer). When sizing up the opposition, it can be important to identify a key player who will control the way the team plays or be the hub through which all play seems to pass. If you can identify this player and neutralise them early on in the game, then you can effectively close down the team’s strategy.
Two London-based mathematicians, Javier López Peña and Hugo Touchette, both football fanatics, decided to see whether Google’s algorithm might help analyse the teams gearing up for the World Cup. If you think of each player as a website and a pass from one player to another as a link from one website to another, then the passes made over the course of a game can be thought of as a network. A pass to a teammate is a mark of the trust you put in that player – players generally avoid passing to a weak teammate who might easily lose the ball, and you will only be passed to if you make yourself available. A static player will rarely be available for a pass.
They decided to use passing data made available by FIFA during the 2010 World Cup to see which players ranked most highly. The results were fascinating. If you analysed England’s style of play, two players, Steven Gerrard and Frank Lampard, emerged with a markedly higher rank than others. This reflects the fact that the ball very often went through these two midfielders: take them out and England’s game collapses. England did not get very far that year in the World Cup – they were knocked out early by their old nemesis, Germany.
Contrast this with the eventual winners: Spain. The algorithm shared the rank uniformly around the whole team, indicating that there was no clear hub through which the game was being played. This is a reflection of the very successful ‘total football’ or ‘tiki-taka’ style played by Spain, in which players constantly pass the ball around, a strategy that contributed to Spain’s ultimate success.
Unlike many sports in America that thrive on data, it has taken some time for football to take advantage of the mathematics and statistics bubbling underneath the game. But by the 2018 World Cup in Russia many teams boasted a scientist on board to crunch the numbers to understand the strengths and weaknesses of the opposition, including how the network of each team behaves.
A network analysis has even been applied to literature. Andrew Beveridge and Jie Shan took the epic saga A Song of Ice and Fire by George R. R. Martin, otherwise known as Game of Thrones. Anyone who knows the story will be aware that predicting which characters will make it through to the next volume, let alone the next chapter, is notoriously tricky, as Martin is ruthless at killing off even the best characters he has created.
Beveridge and Shan decided to create a network between characters in the books. They identified 107 key people who became the nodes of the network. The characters were then connected with weighted edges according to the strength of the relationship. But how can an algorithm assess the importance of a connection? The algorithm was simply asked to count the number of times the two names appeared in the text within fifteen words of each other. This doesn’t measure friendship – it indicates some measure of interaction or connection between them.
They decided to analyse the third volume in the series, A Storm of Swords, as the narrative had settled down by this point, and began by constructing a page rank analysis of the nodes or characters in the network. Three characters quickly stood out as important to the plot: Tyrion, Jon Snow and Sansa Stark. Anyone who has read the books or seen the series would not be surprised by this revelation. What is striking is that a computer algorithm which does not understand what it is reading achieved this same insight. It did so not simply by counting how many times a character’s name appears – that would pull out other names. It turned out that a subtler analysis of the network revealed the true protagonists.
To date, all three characters have survived Martin’s ruthless pen which has cut short some of the other key characters in the third volume. This is the mark of a good algorithm: it can be used in multiple scenarios. This one can tell you something useful from football to Game of Thrones.
Maths, the secret to a happy marriage
Sergey Brin and Larry Page may have cracked the code to steer you to websites you don’t even know you’re looking for, but can an algorithm really do something as personal as find your soulmate? Visit OKCupid and you’ll be greeted by a banner proudly declaring: ‘We use math to find you dates’.
These dating websites use a ‘matching algorithm’ to search through profiles and match people up according to their likes, dislikes and personality traits. They seem to be doing a pretty good job. In fact, the algorithms seem to be better than we are on our own: recent research published in the Proceedings of the National Academy of Sciences looked at 19,000 people who married between 2005 and 2012 and found that those who met online were happier and had more stable marriages.
The first algorithm to win its creators a Nobel Prize, originally formulated by two mathematicians, David Gale and Lloyd Shapley, in 1962, used a matching algorithm to solve something called ‘the Stable Marriage Problem’. Gale, who died in 2008, missed out on the award, but Shapley shared the prize in 2012 with the economist Alvin Roth, who saw the importance of the algorithm not just to the question of relationships but also to social problems including assigning health care and student places fairly.
Shapley was amused by the award: ‘I consider myself a mathematician and the award is for economics,’ he said at the time, clearly surprised by the committee’s decision. ‘I never, never in my life took a course in economics.’ But the mathematics he cooked up has had profound economic and social implications.
The Stable Marriage Problem that Shapley solved with Gale sounds more like a parlour game than a piece of cutting-edge economic theory. To illustrate the precise nature of the problem, imagine you’ve got four heterosexual men and four heterosexual women. They’ve been asked to list the four members of the opposite sex in order of preference. The challenge for the algorithm is to match them up in such a way as to come up with stable marriages. What this means is that there shouldn’t be a man and woman who would prefer to be with one another than with the partner they’ve been assigned. Otherwise there’s a good chance that at some point they’ll leave their partners to run off with one another. At first sight it isn’t at all clear, even with four pairs, that it is possible to arrange this.
Let’s take a particular example and explore how Gale and Shapley could guarantee a stable pairing in a systematic and algorithmic manner. The four men will be played by the kings from a pack of cards: King of Spades, King of Hearts, King of Diamonds and King of Clubs. The women are the corresponding queens. Each king and queen has listed his or her preferences:
For the kings:
For the queens:
Now suppose you were to start by proposing that each king be paired with the queen of the same suit. Why would this result in an unstable pairing? The Queen of Clubs has ranked the King of Clubs as her least preferred partner so frankly she’d be happier with any of the other kings. And check out the King of Hearts’ list: the Queen of Hearts is at the bottom of his list. He’d certainly prefer the Queen of Clubs over the option he’s been given. In this scenario, we can envision the Queen of Clubs and the King of Hearts running away together. Matching kings and queens via their suits would lead to unstable marriages.
How do we match them so we won’t end up with two cards running off with each other? Here is the recipe Gale and Shapley cooked up. It consists of several rounds of proposals by the queens to the kings until a stable pairing finally emerges. In the first round of the algorithm, the queens all propose to their first choice. The Queen of Spades’ first choice is the King of Hearts. The Queen of Hearts’ first choice is the King of Clubs. The Queen of Diamonds chooses the King of Spades and the Queen of Clubs proposes to the King of Hearts. So it seems that the King of Hearts is the heart-throb of the pack, having received two proposals. He chooses which of the two queens he prefers, which is the Queen of Clubs, and rejects the Queen of Spades. So we have three provisional engagements, and one rejection.
First round
The rejected queen strikes off her first-choice king and in the next round moves on to propose to her second choice: the King of Spades. But now the King of Spades has two proposals. His first proposal from round one, the Queen of Diamonds, and a new proposal from the Queen of Spades. Looking at his ranking, he’d actually prefer the Queen of Spades. So he rather cruelly rejects the Queen of Diamonds (his provisional engagement on the first round of the algorithm).
Second round
Which brings us to round three. In each round, the rejected queens propose to the next king on their list and the kings always go for the best offer they receive. In this third round the rejected Queen of Diamonds proposes to the King of Diamonds (who has been standing like that kid who never gets picked for the team). Despite the fact that the Queen of Diamonds is low down on his list, he hasn’t got a better option, as the other three queens prefer other kings who have accepted them.
Third round
Finally everyone is paired up and all the marriages are stable. Although we have couched the algorithm in terms of a cute parlour game with kings and queens, the algorithm is now used all over the world: in Denmark to match children to day-care places; in Hungary to match students to schools; in New York to allocate rabbis to synagogues; and in China, Germany and Spain to match students to universities. In the UK it has been used by the National Health Service to match patients to organ donations, resulting in many lives being saved.
And it is building on top of the puzzle solved by Gale and Shapley that the modern algorithms which run our dating agencies are based. The problem is more complex since information is incomplete. Preferences are movable and relative, and shift even within relationships from day to day. But essentially the algorithms are trying to match people with preferences that will lead to a stable and happy pairing. And the evidence suggests that the algorithms could well be better than leaving it to human intuition.
You might have detected an interesting asymmetry in the algorithm that Gale and Shapley cooked up. We got the queens to propose to the kings. Would it have mattered if we had invited the kings to propose to the queens instead? Rather strikingly it does. You would end up with a different stable pairing if you applied the algorithm by swapping kings and queens.
The Queen of Diamonds would end up with the King of Hearts and the Queen of Clubs with the King of Diamonds. The two queens swap partners, but now they’re paired up with slightly lower choices. Although both pairings are stable, when queens propose to kings, the queens end up with the best pairings they could hope for. Flip things around and the kings are better off.
Medical students in America looking for residencies realised that hospitals were using this algorithm to assign places in such a way that the hospitals did the proposing. This meant the students were getting a worse deal. After some campaigning by students who pointed out how unfair this was, eventually the algorithm was reversed to give students the better end of the deal.
This is a powerful reminder that, as our lives are increasingly pushed around by algorithms, it’s important to understand how they work and what they’re doing, because otherwise you may be getting shafted.
The battle of the booksellers
The trouble with algorithms is that sometimes there are unexpected consequences. A human might be able to tell that something weird was happening, but an algorithm will just carry on doing what it was programmed to do, regardless of how absurd the consequences may be.
My favourite example of this centres on two second-hand booksellers who ran their shops using algorithms. A postdoc working at UC Berkeley was keen to get hold of a copy of Peter Lawrence’s book The Making of a Fly. It is a classic published in 1992 that developmental biologists often use, but by 2011 the text had been out of print for some time. The postdoc was after a second-hand copy.
Checking on Amazon, he found a number of copies priced at about $40, but then was rather shocked to see a copy on sale for $1,730,045.91. The seller, profnath, wasn’t even including shipping in the bargain. Then he noticed that there was another copy on sale for even more! This seller, bordeebook, was asking a staggering $2,198,177.95 (plus $3.99 for shipping of course).
The postdoc showed this to his supervisor, Michael Eisen, who presumed it must be a graduate student having fun. But both booksellers had very high ratings and seemed to be legitimate. Profnath had had over 8000 recommendations over the last twelve months, while bordeebook had had over 125,000 during the same period. Perhaps it was just a weird blip.
When Eisen checked the next day to see if the prices had dropped to more sensible levels, he found instead that they’d gone up. Profnath now wanted $2,194,443.04 while bordeebook was asking a phenomenal $2,788,233.00. Eisen decided to put his scientific hat on and analyse the data. Over the next few days he tracked the changes in an effort to work out if there was some pattern to the strange prices.
Eventually he spotted the mathematical rule behind the escalating prices. Divide the profnath price by the bordeebook price from the day before and you always got 0.99830. Divide the bordeebook price by the profnath book on the same day and you always got 1.27059. Each seller had programmed their website to use an algorithm that was setting the prices for books they were selling. Each day the profnath algorithm would check the price of the book at bordeebook and would then multiply it by 0.99830. This algorithm made perfect sense because the seller was programming the site to slightly undercut the competition at bordeebook. It is the algorithm at bordeebook that is slightly more curious. It was programmed to detect any price change in its rival and to multiply this new price by a factor of 1.27059.
The combined effect was that each day the price would be multiplied by 0.99830 × 1.27059, or 1.26843. This ensured that the price would grow exponentially. If profnath had set a sharper factor to undercut the price being offered by bordee-book, you would have seen the price collapse over time rather than escalate.
The explanation for profnath’s algorithm seems clear, but why was bordeebook’s algorithm set to offer the book at a higher price? Surely no one would buy the more expensive book? Perhaps they were relying on their bigger reputation with a greater number of positive recommendations to drive traffic their way, especially if their price was only slightly higher, which at the start it would have been. As Eisen wrote in his blog, ‘this seems like a fairly risky thing to rely on. Meanwhile you’ve got a book sitting on the shelf collecting dust. Unless, of course, you don’t actually have the book …’
Then the truth suddenly dawned on him. Of course. They didn’t actually have the book! The algorithm was programmed to see what books were out there and to offer the same book at a slight markup. If someone wanted the book from their reliable bordeebook’s website, then bordeebook would go and purchase it from the other bookseller and sell it on. But to cover costs this would necessitate a bit of a markup. The algorithm thus multiplied the price by a factor of 1.27059 to cover the purchase of the book, the shipping and a little extra profit.
Using a few logarithms it’s possible to work out that the book most likely first went on sale forty-five days before 8 April at about $40. This shows the power of exponential growth. It only took a month and a half for the price to reach into the millions! The price peaked at $23,698,655.93 (plus $3.99 shipping) on 18 April, when finally a human at profnath intervened, realising that something strange was going on. The price then dropped to $106.23. Predictably bordeebook’s algorithm offered their book at $106.23 × 1.27059 = $134.97.
The mispricing of The Making of a Fly did not have a devastating impact for anyone involved, but there are more serious cases of algorithms used to price stock options causing flash crashes on the markets. The unintended consequences of algorithms is one of the prime sources of the existential fears people have about advancing technology. What if a company builds an algorithm that is tasked with maximising the collection of carbon, but it suddenly realises the humans who work in the factory are carbon-based organisms, so it starts harvesting the humans in the factory for carbon production? Who would stop it?
Algorithms are based on mathematics. At some level they are mathematics in action. But they don’t really creatively stretch the field. No one in the mathematical community feels particularly threatened by them. We don’t really believe that algorithms will turn on their creators and put us out of a job. For years I believed that these algorithms would do no more than speed up the mundane part of my work. They were just more sophisticated versions of Babbage’s calculating machine that could be told to do the algebraic or numerical manipulations which would take me tedious hours to write out by hand. I always felt in control. But that is all about to change.
Up till a few years ago it was felt that humans understood what their algorithms were doing and how they were doing it. Like Lovelace, they believed you couldn’t really get more out than you put in. But then a new sort of algorithm began to emerge, an algorithm that could adapt and change as it interacted with its data. After a while its programmer may not understand quite why it is making the choices it is. These programs were starting to produce surprises, and for once you could get more out than you put in. They were beginning to be more creative. These were the algorithms DeepMind exploited in its crushing of humanity in the game of Go. They ushered in the new age of machine learning.
5 (#ulink_291742d1-b008-5563-b5a2-18bbbd874856)
FROM TOP DOWN TO BOTTOM UP (#ulink_291742d1-b008-5563-b5a2-18bbbd874856)
Machines take me by surprise with great frequency.
Alan Turing
I first met Demis Hassabis a few years before his great Go triumph at a meeting about the future of innovation. New companies were on the lookout for investment from venture capitalists and investors. Some were going to transform the future, but most would flash and burn. The art was for VCs and angel investors to spot the winners. I must admit when I heard Hassabis speak about code that could learn, adapt and improve I dismissed him out of hand. I couldn’t see how, if you were programming a computer to play a game, the program could get any further than the person who was writing the code. How could you get more out than you were putting in? I wasn’t the only one. Hassabis admits that getting investors to give money to AI a decade ago was extremely difficult.
How I wish now that I’d backed that horse as it came trotting by! The transformative impact of the ideas Hassabis was proposing can be judged by the title of a recent session on AI: ‘Is machine learning the new 42?’ (The allusion to Douglas Adams’s answer to the question of life, the universe and everything from his book The Hitchhiker’s Guide to the Galaxy would have been familiar to the geeky attendees, many of whom were brought up on a diet of sci-fi.) So what has happened to spark this new AI revolution?
The simple answer is data. It is an extraordinary fact that 90 per cent of the world’s data has been created in the last five years. 1 exabyte (10) of data is created on the internet every day, roughly the equivalent of the amount of data that can be stored on 250 million DVDs. Humankind now produces in two days the same amount of data it took us from the dawn of civilisation until 2003 to generate.
This flood of data is the main catalyst for the new age of machine learning. Before now there just wasn’t enough of an environment for an algorithm to roam around in and learn. It was like having a child and denying it sensory input. We know that children who have been trapped indoors fail to develop language and other basic skills. Their brains may have been primed to learn but didn’t encounter enough stimulus or experience to develop properly.
The importance of data to this new revolution has led many to speak of data as the new oil. If you have access to data you are straddling the twenty-first-century’s oilfields. This is why the likes of Facebook, Twitter, Google and Amazon are sitting pretty – we are giving them our reserves for free. Well, not exactly for free as we are exchanging our data for the services they provide. When I drive in my car using Waze, I have chosen to exchange data about my location in return for the most efficient route to my destination. The trouble is, many people are not aware of these transactions and give up valuable data for little in return.
At the heart of machine learning is the idea that an algorithm can be created that will find new questions to ask if it gets something wrong. It learns from its mistake. This tweaks the algorithm’s equations such that next time it will act differently and won’t make the same mistake. This is why access to data is so important: the more examples these smart algorithms can train on the more experienced they will become, and the more each tweak will refine them. Programmers are essentially creating a meta-algorithm which creates new algorithms based on the data it encounters.
People in the field of AI have been shocked at the effectiveness of this new approach. Partly this is because the underlying technology is not that new. These algorithms are created by building up layers of questions that can help reach a conclusion. These layers are sometimes called neural networks because they mimic the way the human brain works. If you think about the structure of the brain, neurons are connected to other neurons by synapses. A collection of neurons might fire due to an input of data from our senses. (The smell of freshly baked bread.) Secondary neurons will then fire, provided certain thresholds are passed. (The decision to eat the bread.) A secondary neuron might fire if ten connected neurons are firing due to the input data, for instance, but not if fewer are firing. The trigger might depend also on the strength of the incoming signal from the other neurons.
Already in the 1950s computer scientists created an artificial version of this process, which they called the perceptron. The idea is that a neuron is like a logic gate that receives input and then, depending on a calculation, decides either to fire or not.
Let’s imagine that the perceptron receives three input numbers. It weights the importance of each of these. In the diagram here (#ulink_1468f28b-963c-5343-b9b5-fbb792f14569), perhaps x
is three times as important as x
and x
. It would calculate 3x
+ x
+ x
and then, depending on whether this fell above or below a certain threshold, it would fire an output or not. Machine learning hinges on reweighting the input if it gets the answer wrong. For example, perhaps x
is more important in making a decision than x
, so you might change the equation to 3x
+ x
+ 2x
. Or perhaps we simply need to tweak the activation level so the threshold can be dialled up or down in order to fire the perceptron. We can also create a perceptron such that the degree to which it fires is proportional to by how much the function has passed the threshold. The output can be a measure of its confidence in the assessment of the data.
Let’s cook up a perceptron to decide whether you are going to go out tonight. It will depend on three things: (1) is there anything good on TV; (2) are your friends going out; (3) what night of the week is it? Give each of these variables a score between 0 and 10, to indicate your level of preference. For example, Monday will get a 1 score while Friday will get a 10. Depending on your personal proclivities, some of these variables might count more than others. Perhaps you are a bit of a couch potato, so anything vaguely decent on TV will cause you to stay in. This would mean that the x
variable scores high. The art of this equation is tuning the weightings and the threshold value to mimic the way you behave.
Just as the brain consists of a whole chain of neurons, perceptrons can be layered, so that the triggering of nodes gradually causes a cascade through the network. This is what we call a neural network. In fact, there is a slightly subtler version of the perceptron called the sigmoid neuron that smoothes out the behaviour of these neurons so that they aren’t just simple on/off switches.
Given that computer scientists had already understood how to create artificial neurons, why did it take so long to make these things work so effectively? This brings us back to data. The perceptrons need data from which to learn and evolve; together these are the two ingredients you need to create an effective algorithm. We could try to program our perceptron to decide when we should go out by assigning weights and thresholds, but it is only by training it on our actual behaviour that it will have any chance of getting it right. Each failure to predict our behaviour allows it to learn and reweight itself.
To see or not to see
One of the big hurdles for AI has always been computer vision. Five years ago computers were terrible at understanding what it was they were looking at. This is one domain where the human brain totally outstrips its silicon rivals. We are able to eyeball a picture very quickly and say what it is or to classify different regions of the image. A computer could analyse millions of pixels, but programmers found it very difficult to write an algorithm that could take all this data and make sense of it. How can you create an algorithm from the top down to identify a cat? Each image consists of a completely different arrangement of pixels and yet the human brain has an amazing ability to synthesise this data and integrate the input to output the answer, ‘cat’.
This ability of the human brain to recognise images has been used to create an extra layer of security at banks, and to make sure you aren’t a robot trawling for tickets online. In essence you needed to pass an inverse Turing Test. Shown an image or some strange handwriting, humans are very good at saying what the image or script is. Computers couldn’t cope with all the variations. But machine learning has changed all that.
Now, by training on data consisting of images of cats, the algorithm gradually builds up a hierarchy of questions it can ask an image that, with a high probability of accuracy, will identify it as a cat. These algorithms are slightly different in flavour to those we saw in the last chapter, and violate one of the four conditions we put forward for a good algorithm. They don’t work 100 per cent of the time. But they do work most of the time. The point is to get that ‘most’ as high as possible. The move from deterministic foolproof algorithms to probabilistic ones has been a significant psychological shift for those working in the industry. It’s a bit like moving from the mindset of the mathematician to that of the engineer.
You may wonder why, if this is the case, you are still being asked to identify bits of images when you want to buy tickets to the latest gig to prove you are human. What you are actually doing is helping to prepare the training data that will then be fed to the algorithms so that they can try to learn to do what you do so effortlessly. Algorithms need labelled data to learn from. What we are really doing is training the algorithms in visual recognition.
This training data is used to learn the best sorts of questions to ask to distinguish cats from non-cats. Every time it gets it wrong, the algorithm is altered so that the next time it will get it right. This might mean altering the parameters of the current algorithm or introducing a new feature to distinguish the image more accurately. The change isn’t communicated in a top-down manner by a programmer who is thinking up all of the questions in advance. The algorithm builds itself from the bottom up by interacting with more and more data.
I saw the power of this bottom-up learning process at work when I dropped in to the Microsoft labs in Cambridge to see how the Xbox which my kids use at home is able to identify what they’re doing in front of the camera as they move about. This algorithm has been created to distinguish hands from heads, and feet from elbows. The Xbox has a depth-sensing camera called Kinect which uses infrared technology to record how far obstacles are from the camera. If you stand in front of the camera in your living room it will detect that your body is nearer than the wall at the back of the room and will also be able to determine the contours of your body.
But people come in different shapes and sizes. They can be in strange positions, especially when playing Xbox. The challenge for the computer is to identify thirty-one distinct body parts, from your left knee to your right shoulder. Microsoft’s algorithm is able to do this on a single frozen image. It does not use the way you are moving (which requires more processing power to analyse and would slow the game down).
So how does it manage to do this? The algorithm has to decide for each pixel in each image which of the thirty-one body parts it belongs to. Essentially it plays a game of twenty questions. In fact, there’s a sneaky algorithm you can write for the game of twenty questions that will guarantee you get the right answer. First ask: ‘Is the word in the first half of the dictionary or the second?’ Then narrow down the region of the dictionary even more by asking: ‘Is it in the first or second half of the half you’ve just identified?’ After twenty questions this strategy divides the dictionary up into 2
different regions. Here we see the power of doubling. That’s more than a million compartments – far more than there are entries in the Oxford English Dictionary, which roughly come to 300,000.
But what questions should we ask our pixels if we want to identify which body part they belong to? In the past we would have had to come up with a clever sequence of questions to solve this. But what if we programmed the computer so that it finds the best questions to ask? By interacting with more and more data – more and more images – it finds the set of questions that seem to work best. This is machine learning at work.
We have to start with some candidate questions that we think might solve this problem so this isn’t completely tabula rasa learning. The learning comes from refining our ideas into an effective strategy. So what sort of questions do you think might help us distinguish your arm from the top of your head?
Let’s call the pixel we’re trying to identify X. The computer knows the depth of each pixel, or how far away it is from the camera. The clever strategy the Microsoft team came up with was to ask questions of the surrounding pixels. For example, if X is a pixel on the top of my head, then if we look at the pixels north of pixel X they are much more likely not to be on my body and thus to have more depth. If we take pixels immediately south of X, they’ll be pixels on my face and will have a similar depth. But if the pixel is on my arm and my arm is outstretched, there will be one axis, along the length of the arm, along which the depth will be relatively unchanged, but if you move out ninety degrees from this direction it quickly pushes you off the body and onto the back wall. Asking about the depth of surrounding pixels could cumulatively build up to give you an idea of the body part that pixel belongs to.
This cumulative questioning can be thought of as building a decision tree. Each subsequent question produces another branch of the tree. The algorithm starts by choosing a series of arbitrary directions to head out from and some arbitrary depth threshold: for example, head north; if the difference in depth is less than y, go to the left branch of the decision tree; if it is greater, go right – and so on. We want to find questions that give us new information. Having started with an initial random set of questions, once we apply these questions to 10,000 labelled images we start getting somewhere. (We know, for instance, that pixel X in image 872 is an elbow, and in image 3339 it is part of the left foot.) We can think of each branch or body part as a separate bucket. We want the questions to ensure that all the images where pixel X is an elbow have gone into one bucket. That is unlikely to happen on the first random set of questions. But over time, as the algorithm starts refining the angles and the depth thresholds, it will get a better sorting of the pixels in each bucket.
By iterating this process, the algorithm alters the values, moving in the direction that does a better job at distinguishing the pixels. The key is to remember that we are not looking for perfection here. If a bucket ends up with 990 out of 1000 images in which pixel X is an elbow, then that means that in 99 per cent of cases it is identifying the right feature.
By the time the algorithm has found the best set of questions, the programmers haven’t really got a clue how it has come to this conclusion. They can look at any point in the tree and see the question it is asking before and after, but there are over a million different questions being asked across the tree, each one slightly different. It is difficult to reverse-engineer why the algorithm ultimately settled on this question to ask at this point in the decision tree.
Imagine trying to program something like this by hand. You’d have to come up with over a million different questions. This prospect would defeat even the most intrepid coder, but a computer is quite happy to sort through these kinds of numbers. The amazing thing is that it works so well. It took a certain creativity for the programming team to believe that questioning the depth of neighbouring pixels would be enough to tell you what body part you were looking at – but after that the creativity belonged to the machine.
One of the challenges of machine learning is something called ‘over-fitting’. It’s always possible to come up with enough questions to distinguish an image using the training data, but you want to come up with a program that isn’t too tailored to the data it has been trained on. It needs to be able to learn something more widely applicable from that data. Let’s say you were trying to come up with a set of questions to identify citizens and were given 1000 people’s names and their passport numbers. ‘Is your passport number 834765489?’ you might ask. ‘Then you must be Ada Lovelace.’ This would work for the data set on hand, but it would singularly fail for anyone outside this group, as no new citizen would have that passport number.
Given ten points on a graph, it is possible to come up with an equation that creates a curve which passes through all the points. You just need an equation with ten terms. But, again, this has not really revealed an underlying pattern in the data that could be useful for understanding new data points. You want an equation with fewer terms, to avoid this over-fitting.
Over-fitting can make you miss overarching trends by inviting you to model too much detail, resulting in some bizarre predictions. Here is a graph of twelve data points for population values in the US since the beginning of the last century. The overall trend is best described by a quadratic equation, but what if we used an equation with higher powers of × than simply x
? Taking an equation with powers all the way up to x
actually gives a very tight fit to the data, but extend this equation into the future and it takes a dramatic lurch downwards, predicting complete annihilation of the US population in the middle of October in 2028. Or perhaps the maths knows something we don’t!
Algorithmic hallucinations
Advances in computer vision over the last five years have surprised everyone. And it’s not just the human body that new algorithms can navigate. To match the ability of the human brain to decode visual images has been a significant hurdle for any computer claiming to compete with human intelligence. A digital camera can take an image with a level of detail that far exceeds the human brain’s storage capacity, but that doesn’t mean it can turn millions of pixels into one coherent story. The way the brain can process data and integrate it into a narrative is something we are far from understanding, let alone replicating in our silicon friends.
Why is it that when we receive the information that comes in through our senses we can condense it into an integrated experience? We don’t experience the redness of a die and its cubeness as two different experiences. They are fused into a single experience. Replicating this fusion has been one of the challenges in getting a computer to interpret an image. Reading an image one pixel at a time won’t tell us much about the overall picture. To illustrate this more immediately, take a piece of paper and make a small hole in it. Now place the paper on an A4 image of a face. It’s almost impossible to tell whose face it is by moving the hole around.
Five years ago this challenge still seemed impossible. But that was before the advent of machine learning. Computer programmers in the past would try to create a top-down algorithm to recognise visual images. But coming up with an ‘if …, then …’ set to identify an image never worked. The bottom-up strategy, allowing the algorithm to create its own decision tree based on training data, has changed everything. The new ingredient which has made this possible is the amount of labelled visual data there is now on the web. Every Instagram picture with our comments attached provides useful data to speed up the learning.
You can test the power of these algorithms by uploading an image to Google’s vision website: https://cloud.google.com/vision/ (https://cloud.google.com/vision/). Last year I uploaded an image of our Christmas tree and it came back with 97 per cent certainty that it was looking at a picture of a Christmas tree. This may not seem particularly earth-shattering, but it is actually very impressive. Yet it is not foolproof. After the initial wave of excitement has come the kickback of limitations. Take, for instance, the algorithms that are now being trialled by the British Metropolitan Police to pick up images of child pornography online. At the moment they are getting very confused by images of deserts.
‘Sometimes it comes up with a desert and it thinks it’s an indecent image or pornography,’ Mark Stokes, the department’s head of digital and electronics forensics, admitted in a recent interview. ‘For some reason, lots of people have screen-savers of deserts and it picks it up, thinking it is skin colour.’ The contours of the dunes also seem to correspond to shapes the algorithms pick up as curvaceous naked body parts.
There have been many colourful demonstrations of the strange ways in which computer vision can be hacked to make the algorithm think it’s seeing something that isn’t there. LabSix, an independent student-run AI research group composed of MIT graduates and undergraduates, managed to confuse vision recognition algorithms into thinking that a 3D model of a turtle was in fact a gun. It didn’t matter at what angle you held the turtle – you could even put it in an environment in which you’d expect to see turtles and not guns.
The way they tricked the algorithm was by layering a texture on top of the turtle that to the human eye appeared to be turtle shell and skin but was cleverly built out of images of rifles. The images of the rifle are gradually changed over and over again until a human can’t see the rifle any more. The computer, however, still discerns the information about the rifle even when they are perturbed, and this ranks higher in its attempts to classify the object than the turtle on which it is printed. Algorithms have also been tricked into interpreting an image of a cat as a plate of guacamole, but LabSix’s contribution is that it doesn’t matter at what angle you showed the turtle, the algorithm will always be convinced it is looking at a rifle.
The same team has also shown that an image of a dog that gradually transforms pixel by pixel into two skiers on the slopes will still be classified as a dog even when the dog had completely disappeared from the screen. Their hack was all the more impressive, given that the algorithm being used was a complete black box to the hackers. They didn’t know how the image was being decoded but still managed to fool the algorithm.
Researchers at Google went one step further and created images that are so interesting to the algorithm that it will ignore whatever else is in the picture, exploiting the fact that algorithms prioritise pixels they regard as important to classifying the image. If an algorithm is trying to recognise a face, it will ignore most of the background pixels: the sky, the grass, the trees, etc. The Google team created psychedelic patches of colour that totally took over and hijacked the algorithm so that while it could generally recognise a picture of a banana, when the psychedelic patch was introduced the banana disappeared from its sight. These patches can be made to register as arbitrary images, like a toaster. Whatever picture the algorithm is shown, once the patch is introduced it will think it is seeing a toaster. It’s a bit like the way a dog can become totally distracted by a ball until everything else disappears from its conscious world and all it can see and think is ‘ball’. Most previous attacks needed to know something about the image it was trying to misclassify, but this new patch had the virtue of working regardless of the image it was seeking to disrupt.
Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию (https://www.litres.ru/marcus-sautoy-du/the-creativity-code-how-ai-is-learning-to-write-paint-and/) на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.