Last year I wrote on a couple of occasions about the Sleeping Beauty problem. The problem raises some tricky questions and I did promise to attempt to answer the questions, which I am yet to do. Only last week, I was discussing the problem again with my friend Giulio, whose paper on the subject I published here. That discussion prompted me to go back to the inspiration for the original post: a series of posts on the Bob Walter’s blog. I re-read all of his posts, including his fourth post on the topic, which began:
I have been waiting to see some conclusions coming from discussions of the problem at the Stubborn Mule blog, however the discussion seems to have petered out without a final statement.
Sadly, even if I do get to my conclusions, I will not be able to get Bob’s reaction, because last week he died and the world has lost a great, inspirational mathematician.
Bob was my supervisor in the Honours year of my degree in mathematics and he also supervised Giulio for his PhD. Exchanging emails with Giulio this week, we both have vivid memories of an early experience of Bob’s inspirational approach to mathematics. This story may not resonate for everyone, but I can assure you that there are not many lectures from over 25 years ago that I can still recall.
The scene was a 3rd year lecture on Categories in Computer Science. Bob started talking about stacks, a very basic data structure used in computing. You should think of a stack of plates: you can put a plate on the top of the stack, or you can take one off. Importantly, you only push on or pop off plates from the top of the stack (unless you want your stack to crash to the floor). And how should a mathematician think about a stack? As Bob explained it, from the perspective of a category theorist, the key to understanding stacks is to think about pushing and popping as inverse functions. Bear with me, and I will take you through his explanation.
Rather than being a stack of plates, we will work with a stack of a particular type of data and I will denote by X the set of possible data elements (X could denote integers, strings, Booleans, or whatever data type you like). Stacks of type X will then be denoted by S. Our two key operations are push and pop.
The push operation takes an element of X and a stack and returns a new stack, which is just the old stack with the element of X added on the top. So, it’s a function push: X × S → S. Conversely, pop is a function S → X × S which takes a stack and returns the top element and a stack, which is everything that’s left after you pop the top.
So far, so good, but there are some edge cases to worry about. We should be able to deal with an empty stack, but what if we try to pop an element from the empty stack? That doesn’t work, but we can deal with this by returning an error state. This means that we should really think of pop as a function pop: S → X × S + I, where I is a single element set, say {ERR}. Here the + is a (disjoint) union of sets, which means that the pop function will either return a pair (an element of X and a stack) or an error state. This might be a bit confusing, so to make it concrete, imagine I have a stack s = (x1, x2, x3) then
pop((x1, x2, x3)) = (x1, (x2, x3))
and this ordered pair of data element x1 and (shorter) stack (x2, x3) is an element of X × S. Now if I want to pop an empty stack (), I have
pop(()) = ERR
which is in I. So pop will always either return an element of X × S or an element of I (in fact, the only element there is).
This should prompt us to revisit push as well, which should really be considered as a function push: X × S + I → S which, given an element of X and a stack will combine them, but given the single element of I will return an empty stack, so push(ERR) = ().
The key insight now is that pop and push are inverses of each other. If I push an element onto a stack and pop it again, I get back my element and the original stack. If I pop an element from a stack and push it back on, I get back my original stack. Extending these functions X × S + I ensures that this holds true even for the edge cases.
But if push and pop are inverses then X × S + I and S must essentially be the same—mathematically they are isomorphic. This is where the magic begins. As Bob said in is lecture, “let’s be bold like Gauss“, and he proceeded with the following calculation:
X × S + I = S
I = S – X × S = S × (I – X)
S = I / (I – X)
and so
S = I + X + X2 + X3 + …
The last couple of steps are the bold ones, but actually make sense. The last equation basically says that a stack is either an empty stack, a single element of X, an ordered pair of elements of X, an ordered triple of elements of X and so on.
I’d known since high school that 1/(1-x) could be expanded to 1 + x + x2 + x3 + …, but applying this to a data structure like stacks was uncharted territory. I was hooked, and the following year I had the privilege of working with Bob on my honours thesis and that work ultimately made it into a couple of joint papers with Bob.
I haven’t seen Bob face to face for many years now, but we have occasionally kept in touch through topics of mutual interest on our blogs. While I have not kept up with his academic work, I have always considered him more than just a brilliant mathematician. He was a creative, inspirational, radical thinker who had an enormous influence on me and, I know, many others.
RFC Walters, rest in peace.