Deriving the Y combinator
This post is an attempt to put my understanding of the intuition behind the Y combinator into words. It is a follow up to my earlier, introductory post on the untyped lambda calculus.
Recall that the Y combinator is defined like so:
Y := λ y → (λ x → y (x x)) (λ x → y (x x))
It takes as input a function argument y and returns the result of applying (λ x → y (x x)) to itself. This means that output of Y R, for some argument R, is R (Y R). Note that both Y R and R (Y R) are partially applied functions.
In the previous post we saw how the Y combinator works. We looked at two examples of problems with recursive formulations and saw how to construct functions R such that Y R, when applied to the required arguments, yields the desired result. One insight we had was that since R is called with itself as the first argument, it must be an (n+1)-ary function, where n is the number of formal parameters in the recursive formulation.
We also saw why the Y combinator is needed. It gives us a way to express recursive functions in languages that do not natively support recursion - like the untyped lambda calculus.
“That great”, you say, “but where did the Y combinator come from?”. Fair question! In the last post we simply presented the Y combinator instead of deriving it from scratch or supplying the intuition behind it.
The latter is what this post will attempt to do.
Our method is as follows.
- We think of a situation that requires the use of a special kind of function.
- We write down a crude definition of that function that breaks the rules of the lambda calculus.
- We then refactor our definition so that it is a valid lambda expression.
- That special function is our Y combinator.
Let’s start with our old friend the factorial function.
F = λ x → Z n 1 (M n (F (P n)))
(please read the previous post if you aren’t sure what this means)
The problem with this is that F is not defined until the expression on the right is fully evaluated, which requires an application of F, which requires that F is fully evaluated to begin with. Let’s break this cycle. Observe that the right hand side can be seen as a function of F. Sure, it’s a lambda taking a single input x, but it could just as well be seen as a function which accepts a single input F', like so:
G = λ F' → (λ x → Z n 1 (M n (F' (P n))))
G = λ F' x → Z n 1 (M n (F' (P n)))
and so our previous expression can be written as:
F = G F
A few things to note about this.
- There was nothing special about the factorial function. We could just as well have written the above derivation for any other recursive function.
- The arity of
Fwasn’t relevant in our derivation either.Fcould have taken, say, 17 arguments and it wouldn’t have made a difference to the logic above.
Now, back to our end result:
F = G F
This is interesting. F is a function that takes a number argument (bear in mind that numbers in the lambda calculus are just functions). G is a function that takes as input two arguments: a function F and a number argument x. G F is therefore a partially applied function that takes one more argument as input. This function G F is equal [1] to F, which is the input to G.
In other words, F is a fixed point of G. So if we know G, we can find F by finding G’s fixed point. How do we do that? Ideally we would have a function Y which would take as input another function and returns its fixed point. This is of course the Y combinator that is the subject of this blog post. Given such a function, we could then pass G to it and get back G’s fixed point F. In other words:
F := Y G
and so
Y G = G (Y G)
Y = λ G → G (Y G)
Note that the above line doesn’t constitute a formal definition of the Y combinator. Why? Because Y appears on the right hand side, and so we have the same issue as we did with F.
We need to get rid of Y appearing on the right hand side. How?
Well, one example of a function that calls its input on itself is the U combinator, which we encountered in the previous post.
U := λ x → x x
Here we have a function Y calling itself. It’s tempting to use the U combinator in this setting. But then what should we apply the U combinator to? Let’s call the unknown argument to the U combinator H, and try to derive an expression for it. We have:
Y G = U H
Y G = H H
where H is chosen such that the property Y G = G (Y G) is satisfied. Now let’s find H.
Y G = G (Y G)
H H = G (H H)
H x = G (x x)
H = λ x → (G (x x))
The key step here was going from the second line above to the third line. We noticed that the second H in H H was reappearing in G (H H) and could therefore be treated as a parameter of the first H in H H. We then replaced it by a formal parameter x and rearranged things some more.
Let’s check that Y G = H H and H = λ x → G (x x) is enough to satisfy our original constraint Y G = G (Y G).
H = λ x → G (x x)
H H = (λ x → G (x x)) (λ x → G (x x))
H H = (λ u → G (u u)) (λ x → G (x x))
H H = G (λ x → G (x x G)) (λ x → G (x x G))
H H = G (H H)
Y G = G (Y G)
Nice, it works out. So:
Y G = H H
Y = λ G → H H
Y = λ G → (λ x → G (x x)) (λ x → G (x x))
Y = λ y → (λ x → y (x x)) (λ x → y (x x))
which is what we were trying to obtain!
And now we have the insight behind the Y combinator. It’s just the end result of refactoring the definition of a fixed point for functions that take other functions as input.
Footnotes
[1] Note that for two functions to be equal, they must return the same output for every input