The BUZZ @ SWARM - Episode 2

About This Episode

If you're curious about what it would be like to be a Quant, the invention of the q-learning algorithm, or if classifying million year old mice and vole teeth is a suitable problem for AI to solve then this podcast is for you. Our guest is Chris Watkins, Professor of Artificial Intelligence at Royal Holloway, University of London but more famously known as the inventor of the Q-table algorithm that sparked the resurgence in reinforcement learning. Listen along as he and Anthony Howcroft (SWARM's CEO) talk about using reinforcement learning to solve challenges in the agri-food supply chain.

Transcript

Holly: Thanks for joining us today for episode two of the swarm engineering podcast. Where we talk about solving problems and Agri-food supply chain using AI. Data science and machine learning. I'm Holly self VP of marketing at swarm engineering. And today I'll be your co-host.

If you're curious about what it would be like to be a quant, the invention of the Q learning algorithm. Or if classifying million-year-old mice and vole teeth. Is a suitable problem for AI to solve. Then we have just the podcast for you.

I'm happy to welcome the very fascinating Chris Watkins, professor of artificial intelligence at Royal Holloway, university of London, but more famously known as the inventor of the queue table algorithm. That sparked the resurgence in reinforcement learning. He's currently working on how to make generally intelligent machines along with abstract models of evolution and statistical visualization.

Today Chris is joined by your host and swarm CEO and founder, Anthony, how craft?

Anthony: Hey, Chris. Thanks for joining us. It's great to have you here

Chris: It's a real pleasure.

Anthony: I'm excited to talk to you about some stuff because you've got a fantastic track record and done some really interesting things. Obviously, you're very well known for reinforcement learning and the Q-learning algorithms. So, we'll come to that, but I know in your background, you were actually a quant, so I've got to ask you, how did you get to be a quant.

Chris: Well, my first real job was in It was a bit of an interesting experience working in an industrial research lab in England working for Philips the Dutch electronics company. And I was there for five years, and I felt that life could get more exciting.

And I think I saw the film wall street and I realized that I wanted to have a little bit of an adventure. And so, I rang up a number of banks and claim to be able to predict the stock market and eventually a small hedge fund offered me a job. And that was a lot of fun for a time.

And we worked on developing some trading strategies and it was enormous fun looking at special situations that stocks and figuring out lots of different bidding strategies.

 I tried a few advanced techniques, but I still realized that trying to predict markets, I realized two things about trying to put it in markets. The first is that it's really boring because you miss all the romance of markets. You just see prices that go up and down and, the story.

behind it, that the market is constantly changing. And as a human place concerned about many things and all you're looking at is numbers. And I thought that was a very weak way to go about it. The next thing I found was that if you try to learn how to predict the market with prior assumptions, you realize that machine learning or statistical learning is a weak discovery procedure.

You're probably not going to do better than people. Unless you're looking at a very different timescale to humans. And what you really need is an edge, a theory, some things that, some ideas about how it should work. You need to have some theoretical inspiration to be able to find something that might work.

 The third thing that I found was that in trying to predict markets and I so discourage students from trying to do it, I really do is that in other parts of statistics, if you're trying to predict the weather, if you're trying to predict anything physical or real or anything with which isn't trying to fool you all the time.

Then you can make predictions. You can actually get it right. Partly. And if you're doing regression, you might explain, say 70% of the variance that's doing pretty well. 80% is doing very well and 40% while you've still got some predictions, but most of what's going on is noise that you don't know about.

But in the financial markets, if you're predicting daily price changes and you could predict even 5% of the variance and the daily price changes, then you'd be doing extraordinarily well. You could get rich if you could do that day after day, so your valuable predictions could be weaker than predictions anywhere else.

Anthony: When I look at markets and machine learning and the data around it, I come back to the Bayesian model where the Turkey thinks it's a great day every day, because historically it has been even as Christmas or Thanksgiving gets closer. So. What do you think about the black Swan events? And if you go back over the last two years, we wouldn't know about the invasion of Ukraine or the pandemic. How do you look at things like that from a predicting markets basis?

Chris: I think for these black Swan events, there's usually a community of people who are predicting disaster at any one time. They're predicting a range of different possible disasters and black swans, and these black swans are not unpredicted and ensuring against black swans is quite expensive.

So, I'm not sure that I completely buy the idea that there are black Swan events that throw everything off by their very nature, they don't happen very often.

It's much more likely that you find a trading system, some trade that makes money. That's a wonderful thing because it means that people will give you money. Either you're providing a hidden financial service, which you better understand, or else some foolish people are just giving you money without meaning to it.

And both of these are unlikely. And if you have a trading system which works, it may work for a time and you don't know why, and then it stops working and you don't know why. And it's very mysterious. You don't really have a causal theory. You don't really have any understanding of why any trading system, a purely technical trading system that you've developed is going to work or not.

So that's a very uncomfortable place to be. And especially with financial market data being the most random data you will ever deal with. So that was my experience of being a quant. Back at that time, I think there were trading strategies, which are pretty simple, which if you use them, they work consistently for years and gradually worse and worse.

 One wishes, one has been doing this in the 1970s. But as far as I'm aware, the strategies perform steadily worse and by now, they have a barely detectable return and simply because more people discovered them, and the market became more efficient.

Anthony: It's a fascinating problem. Isn't it? The whole concept of how you're predicting a market when there's so many variables as well. I think when he was saying about black swans, I know the author, Nicholas Talib actually said black swans aren't unpredictable. It's just, you don't know which one is going to be. There will be a black Swan. You just don't know what type.

Chris: They're also very hard to train. The bond is that you can be completely rational and correct in predicting the markets is the way that the market is going to go. Does this mean you automatically make money? How do you take your position? The answer is it doesn't mean you make money.

If there is some straightforwardly, correct, reasonable and indeed perhaps quite widely accepted view on what is going to happen at the market. Then what happens? The answer is you're predictable, you're a mark and the market will move against you. So, if you think the market's going to go down and you sell well, then you find the market will go up at first, before it goes down and I'm forced to repurchase at a loss. So, getting the trading writers is what really counts is what really makes a difference.

Anthony: I have a little story that illustrates that beautifully about predicting the future. So, several years ago I was invited to speak at an event in Prague and I was looking at the dates and this was when terminal five at Heathrow was due to open and I was going to be flying British airways and that they were going to have all their flights going from terminal five.

I thought to myself, this is going to be a total disaster on the luggage. Because luggage is a classic traveling salesman problem it's going to be highly complex. And I thought I won't fly from Heathrow, um, because I was near Oxford. I could choose to fly from, uh, London or Birmingham. So, I deliberately flew from Birmingham because I thought terminal five will have opened and the luggage is going to be a disaster.

So, a lo and behold terminal five opened and as predicted the luggage was a total disaster, it was headline news. And all of the major papers were running the stories. And I was feeling very smug about this, as I flew from Birmingham, but unfortunately, what they did is because it was such a mess at terminal five.

They moved all the baggage handlers from Birmingham to London. And so, my bag got lost and actually he never reached me in Prague. So, I had one set of clothes for the entire trip, and I thought I'd been so clever, but, um, whilst I predicted the future, I didn't really get an edge. In fact, it was the reverse.

So, you can predict the future, but it doesn't mean, you know, how to capitalize on it.

Chris: That's right. If there are lots of other people predicting at the same time or people reacting to predictions of the future, the correct prediction is something that a lot of people know, but how to win many fewer people know how to do that.

Anthony: So, Chris, how did you come up with the Q learning algorithm approach and what triggered that?

Chris: Oh, I was doing a PhD and in the psychology department at Cambridge, and this was quite an experimental department and I started off doing psychology experiments, and I was really interested in how children's cognition developed? I was fascinated by Piaget who's a Swiss psychologist who worked in the 1920s and thirties, and he wrote about 40 books. And in my view, he was a brilliant experimentalist. He was a wonderful, sensitive experimentalists. He had a rapport with children. He would he does some beautiful experiments to reveal children's development of understanding.

And then the first thing that I did was to go to a primary school in Cambridge and they allowed me to set up in the hall and I simply repeated some of Piaget's observations with the children of different ages. And I found that I could read Piaget's descriptions of his experiments from 50 years before, and I could do what he did and say what he said. And then I saw exactly what he saw on here. It was fascinating. And Children are really complicated. I got completely stuck in my PhD and I thought I'd failed, and I went off and worked for Phillips labs and they very generously sent me off to it, a glamorous workshop on machine learning an almost unheard-of subject back then in 1987 in Irvine, California.

Anthony: Oh wow. Where I'm sitting right now.

Chris: And at one of the talks a very slick professor was describing a computer program, which claimed to be able to solve high school algebra problems. And I really didn't believe this. I thought that somehow in some way, he had surely built the answers into the program.

 You couldn't really say it was solving the problems in the same way that anything like a person would. And so purely to be annoying, I put up my hand and I said, oh, have you studied animal learning at all? Which was probably a bit tactless. It was doing high school algebra and saying, have you studied rats.

And he handled it very smoothly. He said, oh, that's all been done. And then just carried on with his talk. I felt put down as a heckler. And I sat there not listening to the rest of his talk. I thought has anybody studied animal? And. I couldn't think of a single thing. I was this unsuccessful grad student.

I read a lot. I sat there reading rather than coding. I'd read a lot and I just couldn't think of anything. And I thought maybe I could learn about that. And then afterwards, a slightly formidable figure, a man with long red hair and hippy glasses. And bell-bottom jeans came over to me and introduced himself.

And this was Rich Sutton, and he gave me two of his papers, which he thought I might like to read. So, I was slightly blown over in meeting such a genuine person and so I accepted the papers gratefully, and I read them carefully on the way back.

And I realized that these were really original. They were unlike anything else. I'd seen. And that means you want really simple problems, like balancing a pole by my holding the lower end and moving it back with supports, keeping it upright and learning to do that for the system, which didn't know anything about the pole.

It was only punished if the pole fell over. So how could you do that? I haven't seen anything like that. And his method looks as if it clearly worked, but surely there must be a general theory and so I said about his and I went off to the library and I think at that time, everyone else on my project had left and every project had to have a project leader.

Therefore, I was a project leader. So, I had quite a lot of time on my hands, and nobody was telling me what to do. And I found books on dynamic programming, and I read them, and I realized what the theory wasn’t, and I realized that was a general method of learning.

If you could imagine a really simple agent to perhaps a simple organism or some robot agent, and it gets rewards of punishment. So, every moment it observes the state of the world. It can Pick one of a sudden number of actions to do, and then it gets some immediate reward that was a way in which such an agent could optimize its behavior or learn the best rules for action without ever understanding the world.

You could do this. And this took the algorithms of Rich Sutton and Andy Barto and put them into a known framework as a quite different way of approaching this dynamic programming problems. And then I found q- learning.

And one could actually prove that they, found a proof that worked, but it was guaranteed to convert. I didn't get quite get the crude proof right for couple of years, but the essential thing was right. And I realized that I might have a PhD thesis. So, I wrote my thesis and then I rang my Cambridge college and said, I've finished my thesis.

And they were very surprised because I had been this guy who left, and he paid his writing up fee every year just to keep himself on the books. And when I came back and it turned out that Andy, miraculously, was spending the summer in Cambridge working at my old college by complete coincidence and he could be a PhD examiner.

And so, then he was one of my PhD examiners and it was a great experience. And of course, Andy and Rich, were delighted. Them were really nice and they were very enthusiastic about what I'd done and they in fact publicized it. I think it spread in of it didn't get published for another two years.

I didn't want it. That was the importance of publishing things that was a somewhat impractical. And besides I went off and got a job working as a quant in a hedge fund. So, I was going into work, wearing a suit and cuff links and all those things. They actually sent photocopies off to anyone who wanted, and I said, photocopies of off, and it's went viral as photocopies seem to be

Anthony: That's amazing because it's so well-known now one of my questions I was going to ask you was whether it was an overnight hit, but clearly it wasn't. It took a long time and a lot of work to get there I can see that. And also, a lot of time before it's seeped into the public consciousness.

Chris: Yes, it was a really cute overnight hit there was a different way of doing dynamic programming that you could do it without building a model of the system and that was a really intriguing idea. And then it took something like 25 years to get it to work on a serious a non-trivial complicated problem. So, the breakthrough came with the paper that started DeepMind the paper by ni and the others. On playing the Atari suite of computer games.

So again, this isn't a big practical problem. This is playing middle of seventies style Atari games, but you've got a couple of buttons and you've got a scoring system and there's the point system, but nevertheless, they produced a deep neural network which could learn. Good performance on a whole range of different, simple games from ponder space invaders to.

Anthony: And obviously they move that forward to go, which was a really a big hit.

Chris: And then there were real advances. Then there was alpha zero and the extraordinary achievements and these two person comes dates. I think it is said that they used not more than a 1% of the Google cloud in training, which has quite a lot of computation.

But nevertheless, to start with a system, which knows only the rules of chess or only the rules of go and then to surpass all the human knowledge of the game in a matter of hours.

Anthony: It is incredible. I read the paper that they published. I think it was in nature and it did list the computing resource, which was quite phenomenal. But it, did do it in three days. AlphaGo zero in particular that I was thinking of where it beat the previous algorithm and it was astonishing because as you say, I think people are still studying the strategies that the system came up with because they were new ones.

Chris: I'm not really a chess player, but I've watched some of the games and these programs make sacrifices that world champions can make it sometimes. And yet it's still a very limited sort of knowledge. Your knowledge is a knowledge of the basic rules of the game.

And there are knowledges are then looking at position. How good is this position and looking into position? Now in chess, that's what you need.

There's no shortcut trick to calculating what the right move is. It'd be wrong to say chess is designed like that, but that's the reason it's such a popular game is that there are patterns, but they're not boringly regular, there's no equation for the best chess.

So, in chess and go that the reinforcement learning kind of structure of knowledge is a very reasonably what you want. And the other secret of the success in those games is that they train with, by, by what's called self-play. You have two versions of program, or you have the program playing against itself, and the move choices are asked to cast tick.

And so of course the program will try to do things is constantly driven to try things. It hasn't tried before because it's not going to put its opponent into a good position. If it can help it, the opponent of course has the same the same program. And this heuristic of, so this training method, rather of self-play means that you get a really good exploration of the possible strategies.

And the most difficult thing in reinforcement learning for large properties is the expiration. If the problem, is you are in a very large state space, if you simply move randomly then you run like to get anywhere interesting at all. But if there are 20 things you have to get right in order to get to an interesting position this is very rarely going to happen by chance.

So, starting off with a reasonable strategy and then making good strategic variations. That's the real difficulty, because you can never explore all possibilities. And if you make a few random moves in chess, then you always going to lose that. You'll always get some months nonsense position. And so, in reinforcement learning, there are these nice tight pros well now of course I'm far more sophisticated proofs of convergence than mine, but and then in a game with a small state space, you can actually visit every possible situation multiple times.

And then you can prove that q-learning is going to work, q-learning will convert. But even an Atari arcade game, computer game, as simple as Atari, which for machine learning, this is a pretty significant problem. But for people who you think of it as something you tell a child to stop doing, because they were, it was a child is wasting their time.

Then you can almost never revisit the same mistake twice. So instead, you are trying to generalize overstates, which is pretty iffy. And your main problem, because you cannot visit the entire state space. Your problem is hard to get anywhere interesting.

You far from being able to visit every possible situation multiple times, your difficulty is ever reaching any interesting situation at all, but in the small worlds of chess and go this heuristic of self-play, enabled the programs to gradually improve it and gradually improve those rushes and continually, if you like learn to reach position, reach interesting situations that the program had not experienced before.

And I think that's one of the secrets of it of that extraordinary success and it is an extraordinary success.

Anthony: It's interesting hearing you describe it. And the challenges do you think reinforcement learning is going to be applicable to some of the more challenging large state spaces like a supply chain.

Chris: Yes. I think it's very plausible that it can be. The reason is that in a supply chain, you've got a limited set of sensible actions. And if it's designed properly, you've got a limited set of possible situations so that if you're shipping things across the country in trucks you can design your models.

So, you rule out a lot of utterly foolish things say, are not trying to try to get from New York to Chicago via Mexico or anything like that. And so, one's looking at variations of established strategies, and there's been a tremendous amount of effort that's gone into building plausible models of supply chains.

Of course, every model is an abstraction. Every model is occasionally break down that there'll be problems which are slightly too complicated or too irregular. Conventional optimization methods but where I kind of agent-based reinforcement model can do better.

And yet it's not too big that you never find interesting states. Reinforcement learning is a really hard technology to get to work for a new problem. One of the big problems is that this is not a problem in its supply chains because you've got randomness, and you simulate breakdowns, and you simulate variations in demand and so on.

Secondly the timescale of effects is limited in supply chains. You don't get into irreversible traffic jams. Eventually they stop. And also, the aims are really well-defined in supply chains that you want to get things from here to there by a certain number of times. And so, you avoid a whole family of problems, which can affect other possible applications of reinforcement learning, which is in defining rewards. How do you define the rewards? In supply chains these a much clearer.

Anthony: Do you think there are going to be any major breakthroughs in reinforcement learning in the next year or so?

Chris: I think that reinforcement learning produces this quite limited sort of knowledge.

It's like chess knowledge. It's not theory knowledge. It's not on multiple timescales. With deep learning it can contain a lot of information here, you train it, have a data, a training set of inputs and outputs, and it remembers all sorts of different inputs and it smoothly interpolate between the inputs in a way that means that you can get pretty good performance for a lot of inputs that you haven't seen before.

As long as they're fairly similar to the ones is already seen. Describe these things as a memory-based systems but it's a very limited form of knowledge and in reinforcement learning I tend to be slightly more pessimistic than other people. Pessimists that's the wrong word. Critical. In reinforcement learning, you have two types of knowledge. You may have a weld simulator. You may know the rules of chess. If you know the rules of chess and all you know is the rules of chess, all you can do is you can find a move and say, is that a valid move or not?

And have I lost. That doesn't help you very much to play. You need to know a lot more than that in order to play well, you need to know what to do. So, there's another type of knowledge, which is what to do. So given a position on the board, what move to choose, that's called a policy. So that's another type of knowledge. But a policy, that's really useful. It tells you what to do, but you don't really know why. And you don't know if you're in a good situation or not, it's just telling them what to do. And you might ask are their other moves are just as well or why'd I have to do that.

And you have a value function which tells you for any situation. What the value of it is what rewards or payoffs you're going to get from this situation. If you follow current policy that's a useful kind of knowledge as well. You look at the board and you say, oh I'm in a really good position here.

I'm going to win here. But again, you don't necessarily know why. And the aim of reinforcement is to optimize the policy and the value function together. But this gives you at its best. A great skill. But it's in many ways and over a rather brittle and over optimized skill you have, what the best value of all possible plans is at this point, but you don't know the second-best plans.

 You're going straight for stroke towards optimization rather than good things like modularity and standing and robustness and things like that. So, there's a tremendous amount of research on different things and reinforcement learning a moment. But I think to my mind, the problem is that there are other types of knowledge that we don't really know how to describe properly at the moment.

Anthony: And that's part of the drive towards AI general intelligence. It's interesting. At the beginning, you were talking about the work you'd done with children on research. And I've seen that children have multiple different strategies for learning, and it looks like that's where we may go with general intelligence to give a system multiple different strategies, including reinforcement learning clearly.

Chris: Yes. I greatly enjoy watching nature programs and the nature programs have now become a state of high art. And you've watched these films about animals in Africa and the hunters and the hunted. And I love watching these programs and looking for examples of animal learning that seem to be similar to the kinds of machine learning that we can do now.

And it's really hard to find them. Occasionally you can see something that looks a little bit like reinforcement learning, usually from some kind of generalist predator. Reinforcement learning takes a long time.

And there are supervised learning and machine learning. Whereas your training set of inputs and desired outputs, you just don't see that in animals at all. Then there are other types of semi-supervised learning. And yet a baby wildebeest gets up and run. It has to be able to get up and run within less than nine minutes after being born.

And it has to be not just to be able to run. It has to be able to run faster than an adult hyena. Which can go at 30 miles an hour. So that's pretty tough. I'm not sure we can build a robot which can run across the African felt faster than an adult hyena. The baby wildebeest does it just a few minutes after being born, they all do, but otherwise they're not very successful. So, the, there are things we absolutely don't understand still about animal intelligence and animal abilities. And there are things we don't understand about levels of knowledge. So, this term I've been teaching a course in natural language processing for the first time which is a subject I really didn't know much about I had to do it because a previous member of staff left because he got a job in Google research.

And so, I had to teach his course and so I had to do a lot of learning. And we have these miraculous, enormous networks that can produce that can confabulate. You give them a prompt of some texts to start with and they will confabulate coherent paragraphs.

And you can identify in text now, the sense in which a word has been used and with really quite fine detail. For example, we say you mark someone's retirement. That's a synonym of celebrate. Whereas if you mark a piece of paper, you're not celebrating on a piece of paper. And so now these deep language models, this enormous neural network trained on vast amounts of text taken from almost the entire internet.

They can do this, but do they understand anything? No, not really. But if you're given a very large collection of documents, say 50,000 documents in your company and you say, find me documents, which are really similar to this one on a similar topic, we can do that.

We've been able to do that for 20 years. Or you say, go through the documents and find mentions of companies. I might think this would be simple, but it's not pretty simple. Find mentions of people and companies and places. We can do that with 90-95% accuracy. What else can we do?

Summarize the document to make it shorter. You can do that very productively. But none of these things are, oh, you can produce a rough translation. And that is one of the achievements of the age. Only very large companies can ever hope to produce translation software, but the translation software is very easily fooled.

And if you look at what's actually necessary for understanding the fine structure of the meaning of sentences, we are a very long way from being able to do that.

Anthony: as an English person living in America for the last seven or eight years, I'm astonished that, everybody knows tomato, tomato, and the difference between elevators and lifts and, jumpers and sweaters, and some of the well-known differences in language, but literally every day, I still discover subtle differences in the language your example of marking retirement. I don't know whether that would apply in America, for example. And there are multiple differences like that. If you table an issue in the UK, it means you want to discuss it right now. Whereas if you table an issue in the us, you actually want to put it off and discuss it later.

I'd be fascinated to see as NLP systems get smarter, if they can start to identify more of those. But I've been astonished at how many differences there are, especially within a language I thought I knew. So, switching gears, we did speak last year, in a separate call about questions and how you go about qualifying a project before devoting any efforts to solve it.

 I found this really useful because we were building out the swarm challenge, modeler, which specifically helps users easily find and solve problems. Your list of questions was very intriguing because some of the critical questions weren't about the shape of the problem, but the software issues like does the company actually want to solve the problem or need to solve it.

But it did suggest to me that perhaps you'd done some projects that didn't go as planned. And I wondered if you had an example.

Chris: Oh, plenty of them. Yes. We spent some years sporadically attempting to collaborate with a large utility provider in the UK. And we were collaborating or trying to collaborate with the consumer division.

Their priorities just seemed so weird. We couldn't understand why it never seemed to start. Our conversations never seem to go anywhere. And eventually we did quite a good trick. We employed a really smart guy. The company gave us money to employ an academic postdoctoral research assistant, who we then embedded in the company.

So, he was working for us. He was our academic spy. And the company got a very benevolent spy. A really well-intentioned extremely smart, very capable person, rather cheaper than they would have hired them otherwise. And eventually he realized what the problem was. And the problem was very simple. They didn't want to make a profit.

They were guaranteed to make a profit because everything they did was controlled by the regulator controlled their profits to be a certain amount which is what they were going to make. So, what they cared about was a market share of customers and they cared about pleasing the regulator.

So, by failing to understand the overall aim of the organization we failed to understand the whole framework of the problem.

Anthony: Interesting. It's always that higher level context. That's so valuable. I know one of the other questions that you had on your list was to find out whether a problem was suitable for AI or machine learning.

How do you go about determining that?

Chris: Oh, let me give you an example of a problem, which is not suitable. My head of department came to me, and he said that the very nice professor in the geology department does excavations of sediments of a couple of million years ago, and they get little teeth from mice voles, and they can tell an awful lot about the climate back then by looking at these teeth.

And she has research. Students are taking these teeth and putting them under the microscope and see which species they are. She thought, we thought that Chris, you could produce an AI system to classify these teeth. So, I said, okay. And I went off and I spent time and she was a very nice person.

It was a great problem. And I saw pictures of these teeth and realized it was going to be a nightmare problem because the teeth were lumps of rock and sand sticking to them under the microscope. What sort of error rate would you like? Would you be happy with say 85%? And she said what, no, my precious teeth these mice and voles, they died to leave us their teeth and you're going to misclassify... And, then you go look at how many teeth in total. And it's only going to be at the most 10,000 teeth.

It's really not worth it. So, the first thing is if you're going to go through the pain of deploying, building, to scoping, designing, deploying an ML system. This is going to be expensive. It's going to involve a lot of people. The people who build it are going to the machine learning engineers and the data scientists, but really, they don't own the product.

Really it has to be the users, the people who understand the problem. It has to be their project. Now, the first requirement is you have a machine learning project. It's got to make a lot of decisions to justify it. They've got to be a lot of decisions that needs to make too many for humans to make.

Each of them is worth a little bit of money. It's got to be done at scale for it to be worthwhile. The second requirement is you have to be able to design the system in such a way that mistakes are cheap. That mistakes don't matter because there are going to be mistakes. The classic case is said in some taxi company, having an AI system that estimates how long your taxi is going to take you to arrival if it's wrong it's not the end of the world. So, a lot of people say, oh, we're going to be able to diagnose cancer. Whoa. Hey, just designing, building machine, learning into your cancer diagnosis pathway is a whole lot trickier because mistakes are the things you really don't want. In a commercial sense a problem which has sufficient scale. So, it's worthwhile for some decisions sufficiently simple that there's regularity over time and mistakes are cheap.

Anthony: Yeah, that's really good. I actually recently rewatched the movie minority report with my daughter. If you remember, there are three telepaths that predict where murders will occur and the Tom cruise character tracks down and arrest people who are going to commit murders before they commit the crime.

And one day course they predict the Tom cruise will commit a murder and he go on the run. As the movie progresses, we discover that sometimes the telepaths don't agree on the future. And the minority report is where two of the three telepaths predict someone's going to be murdered but the third doesn't. Before they had the telepaths, though, there were hundreds of murders every year but now there is zero. And there are handful of minority reports each year, too, which means four or five people might get sent to prison for a crime they were never going to commit. And ultimately, they scrap the system because they perceive that the model is flawed. But I thought the ethics were very interesting.

They're preventing four or five, potentially innocent people going to prison, but returning to a world where 500 people get murdered per year. And I thought that was a really interesting dilemma. And I think that's where there are challenges still for machine learning. As ethics becomes a more critical issue.

A minority report was ahead of its time in starting to look at how those ethics play out in the real world.

Chris: Oh yes. Utilitarianism is always a little tricky. In ethics, there are real problems. I feel a fundamental fault line, which causes a lot of problems is supposing a company produces a system which has been constructed using machine learning, which kind of works on the data that the company has collected.

And I say it works because it works. They didn't really know why it works. And it works on the data they've collected in the past. All machine learning systems are trained on data from the past, and they're going to be used on data in the future. And in the past, in the future, it can be different.

 Then because, they don't really have that much secret sauce they want to keep. And this is software, or you keep the design of your software confidential. So, you then have a system which has been trained on data of a certain type, under certain assumptions being used by other people. And there's company salesman may sell it in cases where it's either appropriate or not appropriate.

And these systems can then be used in circumstances where they affect people's lives and correcting injustices can be very hard even within companies.

The next point about a machine learning system is it isn't a build and forget system. It should be continuous deployment and co-creation, but I really do feel that machine learning system need to be owned by the people who use them. So that in, the present state of the technology, we don't have a wise artificial intelligence that knows a lot. We have stupid systems, which have been shown a lot of different situations and what to do in those situations.

And there is going to be a situation because they've not encountered when they don't know what to do, and the situations can change. A machine learning system is something that needs to be continuously fixed.

Anthony: Fascinating. It's a continuously fixed and essentially, it's continuously learning as well. And the interesting thing is the humans are also learning. You mentioned the animal learning earlier as well, Chris, but when we use the machine learning system, they often end up teaching us. We've all learned how to type searches into Google, for example, it's essentially the system has trained us how to effectively use it. We build our swarm systems as cognitive computing, so that it's a collaboration between the human and the machine learning system, rather than having the system take over. I think it's the collaboration that's going to be more powerful and more important as we go forward.

Chris: Absolutely. A Machine learning system is in an organization. It's a collective way of remembering a lot of situations encountered and what to do, which can then very cheaply decide what to do. Which brings to other criteria for a machine learning problem. It's really good if you've got instant validation of the decisions made.

So, Google really chose the best machine learning problem. So, whenever you do a Google search, Google displays, using a very sophisticated algorithm from information about you, the searcher and your search query, probably your history to put up some advertisements, some paid links, which if you click on, you'll give Google money. Let's just run this by. First of all, it's done at scale billions of people every day. Each decision can make them a small amount of money. Most people don't click, but when they do, they get anything from pennies to $100. Next you get instant feedback on the decision.

So, if Google system goes wrong and they have a terrible day. They recommend advertisements to everyone that's absolutely stupid. I'm not buying that. I'm not taking that. And Google makes no money well, they immediately find out.

So that's the ultimate example of a profitable and an ideal application of machine learning and its current state. And similarly with a taxi company there's a whole series of small decisions they make. You want to book a ride? How many taxis? How far away do they show your ride request to?

If it's too few, you don't get one. And if it's too many, you annoy all their drivers. Again, they get instant validation. They know how long it takes to get a driver to accept that ride and come to you. And similarly with all their estimates of how long it's going to take and what the fare is going to be, it's lots of different estimation, continuous estimation problems, all instantly validated.

And I think there are problems like this. With a slight proviso that in logistics, they have knock on effects. Of course, the quickest way to get anything anywhere is to get a truck and put one thing in it and send that truck straight to the place, but that's not going to solve your problems.

The next package there, wasn't a truck waiting and that's a very expensive way to do it. Actions have secondary effects and that's where you may need reinforcement learning. But I'm a much less experienced consultant than you are to companies, but in my limited experience of being a consultant, the most important thing of all is to find out if the company wants it, find out if the management want it, and for any project in the company you need to cooperate with people to get the project to work. You can't go in there and pirate the data. You want to talk to people to get the right data at the right time to find out what the problem really is.

And there'll be a number of different people who, if they feel threatened by your fancy sparkling expensive machine learning project, they can make sure it doesn't work. It can make sure that they just don't cooperate slightly giving you the data or when your system does something stupid, they don't help you to correct it.

And so, getting the right credit to people in the company who know about the problem that you're trying to solve, That's absolutely key thing. If they want you to succeed, you can succeed.

Anthony: Yeah, that's a really good point.

I know we're nearly out of time. I was just going to ask you one last question. My wife does work for an academic publisher, and so I have to ask you about your perspective on open access, which is the model where the author, rather than the publisher pays for publication. It does feel like Europe and the U S are approaching this at a very different pace and with different attitudes.

I was just interested in how you thought research on AI should best be shared.

Chris: The first thing to say about research on AI is that it's astonishingly well shared right now. Anything you write goes straight onto archive. The amount of excellent open-source software that's available is extraordinary. One reason for the impact of deep learning is that both companies and individuals and universities have made that software available in machine learning. The companies that can afford it have spent enormous amounts of money training, the natural language prediction models, BERT and GPT-3 and so on, and they've made them available for free.

 And the software tools and libraries like Python and Julia, has tens of thousands of excellent packages that contrast with say that the progress that's been made over the last 30 years. It is absolutely amazing. Oh, let's not forget, for example the advice that's available for free on stack overflow.

 So, then you have academic publish this varies greatly by discipline and there are different requirements. There is a tension between the requirements. In some disciplines If you have wrong information published, you can cause people to make really expensive mistakes.

 And I'm thinking molecular biology Herald. The worst example I can think of is the alleged discovery of a virus causing a, Emmy, several years ago. And then many other labs then start trying to replicate that. And they devote time to these really expensive and time-consuming and difficult spreads, and they find it doesn't work because the publication was wrong. Here you want really stringent reviewing, and you have to have editors do a lot of work and reviewers do a lot of work. The reviews are, I think it's still not normally paid, but reviewers take their job extremely seriously from the sake of the community.

And that has a whole set of requirements now in computer science particularly in machine learning and not a lot of papers are saying, Hey, my program, what's better than your program. And oh, by the way, here's my program. And you put your program on, GitHub. This is checkable and it's not terribly expensive to check.

And maybe my program works better than your program for these data sets, but not for those data sets. So, it works better for this problem and so you get a kind of chaos but it's checkable and improvable chaos. And in the field of machine learning, the huge difference what's happened is an enormous difference in scale.

And of course, the quality of reviewing is necessary because there are so many papers. And so, the field has been very active in both initiating open access journals and very active on alternative review models on open review models, but they're requiring someone to do a lot of work without being paid for it.

The future is a lot more experimentation with different open review models. So, it's a time of great experimentation.

Anthony: We use both open source and proprietary algorithms. We can use a mix in our underlying engine, but because it's delivered as a software as a service, some companies view that differently because it's not running within their firewall. The whole of industry publishing seems to be evolving in just as other industries have also evolved. It's not immune and it seems like the old publish and perish is still there in the academic world, but it's changing, as to how that's done just as the supply chains, we're dealing with are evolving.

I think it's an era of great change in the way we do things. I'd like to thank you for coming on the show, Chris and spending some time with us.

Chris: Thank you for having me.

Anthony: look forward to catching up with you again, and hopefully when we're traveling. So, I'll be able to come and see you in London at some point.

Chris: That'd be cool. That'd be great.

Holly: And that concludes our episode, two of the swarm engineering podcasts. Thanks for listening. And if you'd like to get in touch or send us a message, you can always find us. On LinkedIn, or if you're feeling a little more formal, you can fill out the contact us form on our website. At www.swarm.engineering.

 We have some great guests lined up for future shows that you've really do not want to miss. So, make sure you subscribe to the podcast. To get all of the updates on our future shows. One last thing. We still do not have an official name for our podcasts. Can you help us come up with a winning name? We would really love your help. You can go to our website to cast your vote or make a suggestion at swarm.engineering/nameourpodcast.

Thanks again and see you next time


Previous
Previous

SWARM - Most Recommended Food Supply Chain Company

Next
Next

Top 10 Food Service Management Solutions Providers of 2022