Recursion Theory and Joy

by Manfred von Thun

Abstract: Joy is a functional programming language which is not based on the application of functions to arguments but on the composition of functions. Many topics from the theory of computability are particularly easy to handle within Joy. They include the parameterisation theorem, the recursion theorem and Rice's theorem. Since programs are data, it is possible to define a Y-combinator for recursion and several variants. It follows that there are self-reproducing and self-describing programs in Joy. Practical programs can be written without recursive definitions by using several general purpose recursion combinators which are more intuitive and more efficient than the classical ones.

Keywords: functional programming, functionals, computability, diagonalisation, program = data, diagonalisation, self-reproducing and self-describing programs, hierarchy of recursion combinators, elimination of recursive definitions.


Introduction

This paper describes some aspects of the language from the perspective of recursive function theory.

A deep result in the theory of computability concerns the elimination of recursive definitions. To use the stock example, the factorial function can be defined recursively in Joy by

        factorial  ==
            [0 =] [pop 1] [dup 1 - factorial *] ifte
The definition is then used in programs like this:
        5
        factorial
Because in Joy programs can be manipulated as data, the factorial function can also be computed recursively without a recursive definition, as follows:
        5
        [ [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
        [dup cons] swap concat dup cons i
The second line in this program does much the same as the body of the definition of factorial, but it is a quoted program. The third line first transforms this into another longer quoted program which performs "anonymous" recursion, and then the final i combinator essentially dequotes this program causing its execution.

The third line implements Joy's counterpart of the Y combinator of the lambda calculus and of combinatory logic. Exactly the same line can be used to cause anonymous recursion of other functions which are normally defined recursively.

Joy has other combinators which make recursive execution of programs more succinct. (Of course it is also possible in Joy to compute the factorial function more efficiently with iteration instead of recursion.)

The remainder of this paper deals with various aspects of the theory of computability, in particular the theory of recursive functions. The next section gives a recursive definition of a recursion combinator. Following that is a section dealing with various well known and elementary theorems from recursive function theory. The next section then proves a fixpoint theorem for Joy. The theorem implies the existence of self-reproducing programs, as shown in the following section. There is also some discussion of the effect of evaluation order on termination. The topic of the next section is Rice's theorem for Joy. Then there is a section describing further self-reproducing and self-describing programs. Another recursion combinator is constructed in the next section. The final section discusses the more practical recursion combinators of Joy. Previous knowledge of the field of recursion theory is not assumed.

Recursion and its elimination

The factorial function and the Fibonacci function are often used to illustrate two different recursion patterns, although both are most efficiently computed non-recursively. This section follows tradition by using these two functions. The ultimate aim is to show how recursive definitions of functions can be eliminated to obtain a (closed) form of a program for recursive execution.

In conventional notation the factorial function can be defined recursively like this:

    r-fac(n)  =  if n = 0 then 1 else n * r-fac(n - 1)
It is obvious that in conventional notation the definitions need the formal parameter n. Joy was designed to eliminate formal parameters in definitions, and the factorial function would be defined in Joy like this:
    r-fac1  ==  [0 =] [pop 1] [dup 1 - r-fac1 *] ifte
The suffix 1 serves to distinguish this definition from a later version. The RHS of the definition contains three quotations: an if-part, a then-part and an else-part. These serve as parameters to the ifte combinator. The if-part tests for equality with 0, and if that is true then the then-part pops off the parameter and replaces it with the result 1. Otherwise the else-part is executed, which duplicates the parameter, subtracts 1 from the top copy, calls r-fac1 recursively on that and, when that call has returned with a value, multiplies that with the original. Using a more Joy-like idiom:
    r-fac1  ==  [null] [succ] [dup pred r-fac1 *] ifte
This uses the null predicate in the if-part, the successor function in the then-part and the predecessor operator in the else-part.

In conventional notation the Fibonacci function is defined recursively like this:

    r-fib(n)   =  if n <= 1 then n else r-fib(n - 1) + r-fib(n - 2)
A more or less literal translation into Joy is the following:
    r-fib1  ==  [small] [] [pred dup [r-fib1] dip pred r-fib1 +] ifte
The if-part uses the small predicate, which for numeric parameters yields true for 0 and for 1. The then-part is the empty quotation [], when executed it does nothing. In the else-part r-fib1 has to call itself twice, once each for the two parameters that have been prepared by dup. The dip combinator applies the quotation [r-fib1] to the earlier version, and the later version, after pred has done its job, is handled directly by r-fib1. A cleaner version is
    r-fib2  ==  [small] [] [pred dup pred [r-fib2] app2 +] ifte
The app2 combinator applies the quoted [r-fib2] twice, to each of the two numbers on top of the stack.

It will help if the minor differences between the definitions of the factorial function and the Fibonacci function are eliminated. In particular this concerns the patterns of the recursive calls. In the body of r-fac1 the direct recursive call has been replaced by its quotation and app1, which applies the quotation just once, to the single number. So here are two other versions, they have aligned to make comparisons easier.

    r-fac2  ==  [ null] [succ] [     dup pred [r-fac2] app1 *] ifte
    r-fib2  ==  [small] [    ] [pred dup pred [r-fib2] app2 +] ifte
The task of eliminating the recursion in the RHS of the definitions amounts to this: The occurrences of the quoted programs [r-fac2] and [r-fib2] have to be replaced by the quoted RHS. But this will introduce those same quotes again, and these have to be replaced by the RHS, and so on ad infinitum. It seems impossible.

One part of the solution is that the programs have to be given an extra parameter which is to be used when the else-part is executed. The extra parameter will have to contain whatever is necessary to enable the recursion inside the else-part. But this means that the extra parameter will be somewhat in the way. Consequently the if-part and the then-part need an extra pop to remove the unneeded parameter. Furthermore, in the else-part any preparatory work before the actual recursion has to be done below the extra parameter using dip. The one or two recursions are then to be effected by app1 and app2, respectively. The two programs now look like this:

?-fac:  [[pop  null] [pop succ] [[dup pred     ] dip app1 *] ifte] ?
?-fib:  [[pop small] [pop     ] [[pred dup pred] dip app2 +] ifte] ?
To indicate that these are not recursive definitions, they are not given as definitions at all. The ?-symbol is left unanalysed at this point, only this much can be said about it:
  1. it takes the entire quoted program as a parameter and then produces another quoted program with a special property.
  2. it then executes this resulting program.
So the ?-symbol denotes a strange combinator.

For comparison the three versions of the factorial program are listed here:

r-fac1 == [     null] [    succ] [ dup pred        r-fac1       *] ifte
r-fac2 == [     null] [    succ] [ dup pred       [r-fac2] app1 *] ifte
?-fac:   [[pop  null] [pop succ] [[dup pred     ] dip      app1 *] ifte] ?
And here are the three versions of the Fibonacci program:
r-fib1 == [    small] [        ] [ pred dup [r-fib1] dip r-fib1 +] ifte
r-fib2 == [    small] [        ] [ pred dup pred  [r-fib2] app2 +] ifte
?-fib:   [[pop small] [pop     ] [[pred dup pred] dip      app2 +] ifte] ?

The recursion combinator must do this: At the point where ?-fac uses app1 and where ?fib uses app2 they expect ?-fac and ?fib on top of the stack. That is why the if-parts and the then-parts each need an extra pop, and why the else-part has to do its initial work from a dip. It all means that a recursion combinator has to supply the RHS to itself as an extra parameter. In general, a recursion combinator expects a program [P] and it executes it in a special way. It must call P but provide the extra parameter. This is the defining law for a recursion combinator ? :

        [P]  ?   ==   [[P] ?]  P

How can a recursion combinator be defined? This can be done recursively and non-recursively. Once the ? combinator is defined, it can be used to eliminate all other recursive definitions. If a recursion combinator is defined recursively, then this would be the only recursive definition that is needed. Here is a step-by-step development of such a definition. The development is a fairly typical example of how one can write Joy programs systematically. Start with the defining law:

        [P] ?
    ==  [[P] ?]  P
In the second line P occurred twice, in quoted and unquoted form. It will be simpler if it occurs in only one form, and that has to be the quoted form. So, using the i combinator,
    ==  [[P] ?]  [P]  i
The two quotations are not exactly the same, but they can be produced from two identical quotations:
    ==  [[P ?]  [[P] ?]  first  i
Now the two identical quotations can be produced by the dup operator:
    ==  [[P] ?]  dup  first  i
All that is needed now is to extract [P] to the left:
    ==  [P]  [?]  cons  dup  first  i
In this construction each line was equivalent to its one or two neighbours. Hence the first and the last lines of this construction are equivalent:
        [P]  ?   ==   [P]  [?]  cons  dup  first  i
Hence the following is a suitable recursive definition of a recursion combinator:
        ?   ==   [?]  cons  dup  first  i

It is useful to think of the RHS as being composed of two parts: the first is

        [?]  cons  dup  first
and the second part is just i. The first part is just a function which takes one quoted program as a parameter and produces two quoted programs as values:
        [P]  [?]  cons  dup  first
        [[P]  ?]  dup  first                    (by cons)
        [[P]  ?]  [[P]  ?]  first               (by dup)
        [[P]  ?]  [P]                           (by first)
The second part of the definition is i, which effectively dequotes the topmost [P] and hence executes P. So the next reduction step depend on what P actually does.

An alternative recursive definition may be constructed:

        [P]  ?
    ==  [[P] ?]  P
    ==  [[P] ?]  [P]  i
    ==  [P]  [?]  cons  [P]  i
    ==  [P]  [P]  [[?] cons]  dip  i
    ==  [P]  dup  [[?] cons]  dip  i
The resulting alternative recursive definition is
        ?   ==   dup  [[?] cons]  dip  i
It follows that the two RHSs of the two equivalent definitions are equal:
        [?]  cons  dup  first  i   ==   dup  [[?] cons]  dip  i
(Note that cancellation of the trailing i combinator on both sides is not valid in general, though it would be valid here.)

We now have a recursive definition of recursion. Computationally this is quite adequate, the recursive definition of any one of the ? combinators really does make it possible to eliminate all other recursive definitions. Some of the following sections deal with the elimination of recursion even for the recursion combinators.

The recursive definition of a recursion combinator can of course also be given in other languages. For an example in the lambda calculus see {Henson87}. For an example in ML see {Sokolowski91}.

Theorems in recursion theory

The theory of computability treats various formalisms which independently aim to capture the essence of computation. These formalisms include Turing machines, the lambda calculus, Markov algorithms, register machines, flow charts and any of the conventional programming languages. Each formalism deals with various specifications: for the Turing formalism the specifications are particular Turing machines, for the Markov formalism the specifications are particular Markov algorithms, for the programming languages they are particular programs, and so on. Some results in the theory concern connections between the formalisms, whereas others concern connections within the same formalism.

The main results in the theory can be divided into two groups: those concerned with the relationships between two such formalisms and those concerned with just one.

The principal results in the first group are that the formalisms are all equivalent, in this sense: Given any two formalisms, for any specification in the one formalism there is a specification in the other formalism such that the two specifications compute the same function. Proofs of such results are always constructive, by exhibiting an algorithm which converts any specification in the one formalism into a specification in the other.

Results in the second group concern just one formalism and often take this form: For any specification S1 having a certain property there is another specification S2 such that the two specifications S1 and S2 are related in some special way. Proofs are again constructive, by exhibiting algorithms for converting S1 into S2.

The algorithms of both groups can be expressed in any of the formalisms. The constructive proofs then look like this: There is a specification S (in formalism F) with the capacity of transforming any specification S1 (in formalism F1) into a specification S2 (in formalism F2, not necessarily different from F1) such that the two specification S1 and S2 are related in a special way. Since the formalisms are universal, the algorithms can be written in the formalism itself, as another specification S. The results then take this form: There is a specification S such that for every specification S1 the result of applying S to S1 yields the required S2.

Historically the theory of computability has been mainly couched in terms of recursive functions, a collection of functions built from a small base by means of a small number of constructors. These functions take natural numbers as arguments and give natural numbers as values. By means of a theoretically elegant (but computationally unfeasible) mapping called Gödel numbering, linguistic entities such as expressions can be assigned a unique number. In this way the functions can be made to "talk about themselves". Any syntactic operation on linguistic entities has a counterpart purely arithmetical operation on natural numbers which are the Gödel numbers of these entities. The theorems are basic to any proper understanding of the the foundations of computer science. But the proofs, when not hand-waiving by appeals to Church's thesis, tend to be forbidding.

The ease of writing S depends crucially on how well the formalism can handle its own specifications. Often it is necessary to use encodings of S1 and S2 into a form which the formalism can handle. In the worst case the encodings are truly ghastly. In the best case the formalism can handle its own specifications very naturally. This is sometimes expressed by the slogan program = data. Most formalisms fare badly on this, including most programming languages.

Some languages can treat their own programs as data; they are Lisp, Snobol, Prolog and their descendants, some macro generators and the command languages based on them. Joy does it at least as well. In these languages programs can operate on other programs or on other data to produce other programs which may then be executed. Because of this, arithmetisation is not necessary, functions can take programs rather than Gödel numbers of programs as arguments, and metatheoretic proof become much easier. {Phillips92} writes

One might wonder how differently recursion theory might be viewed if it had arisen out of practical developments instead of predating them.
Indeed, one might speculate how much more natural computability theory would have been if Lisp has been invented thirty years earlier. Readers might want to pursue this topic, see the discussion in {Hofstadter85} starting on p 444, and his very startling conclusion on p 449.

In Joy many metatheoretic proofs are easier still because there are no named formal parameters. Consequently the difficult operation of substituting actual parameters for formal parameters does not occur, and everything is algebra. Reasoning about Joy programs will now be illustrated with a number of classical theorems.

A simple form of the parametrisation theorem states: there is a recursive function taking the Gödel number of an n-ary function F1 as one argument and taking the Gödel number of a numeral as a further argument such that the value of this recursive function is the Gödel number of an n-1-ary function F2 which is obtained by substituting the value of the numeral for the first argument of F1.

Here is the same theorem for Joy. An operand is a numeral (or any other constant) or a quotation.

        For any program P and operand X
            there is a program Q such that
                Q   ==   [Q] i  ==  X  P
The required program Q is just the concatenation. But the theorem can be strengthened:
        There is a program O such that
            for any program P and operand X
                there is a program Q such that
                    X [P] O  ==  [Q] and
                    Q  ==  [Q] i  ==  X [P] i  ==  X P
The required program O is cons:
        X [P] O  ==  X [P] cons  ==  [X P]  ==  [Q]
The theorem is rather trivial in Joy. The formalisation of the proof is an overkill. However, many proofs later on will have the same structure, and it may help to get used to that.

The parametrisation theorem generalises to any number m of arguments that are parametrised. Consequently it is often called the S-m-n theorem, see {Rogers67}. The "S" stands for "substitution" - it means that m of the n formal parameters are replaced by fixed values. In Joy notation repeated cons operations can parameterise for any m, simply by repeating cons m times.

In recursive function theory it is possible for functions to take other functions as parameters, and since functions are untyped, they can take themselves as parameter. One consequence ot the S-m-n theorem is the diagonalisation theorem: There is a recursive function taking as argument the Gödel number of a function which takes at least one parameter, and giving as value the Gödel number of the function obtained from the given one by substituting itself for the parameter.

Again, the counterpart in Joy is trivial.

        For any program P
            there is a program Q such that
                [Q] i  ==  Q  ==  [P] P
The required quoted [Q] is [[P] P]. The stronger form is:
        There is a program O such that
            for any program P
                there is a program Q such that
                    [P] O  ==  [Q]  and
                    [Q] i  ==  Q  ==  [P] dup i  ==  [P] P
The required program O is dup cons. Proof:
        [P] O  ==  [P] dup cons  ==  [P] [P] cons  ==  [[P] P]  ==  [Q]
        [Q] i  ==  [[P] P] i  ==  [P] P  ==  Q
So in Joy the short program
        dup  cons
implements diagonalisation.

It may help to see diagonalisation in action for a small program:

        [dup reverse concat] dup cons i
    ==  [dup reverse concat] [dup reverse concat] cons i       (by dup)
    ==  [[dup reverse concat] dup reverse concat] i             (by cons)
    ==  [dup reverse concat] dup reverse concat                 (by i)
    ==  [dup reverse concat] [dup reverse concat] reverse concat(by dup)
    ==  [dup reverse concat] [concat reverse dup] concat        (by reverse)
    ==  [dup reverse concat concat reverse dup]                 (by concat)
So the program [dup reverse concat] diagonalised and run produces its own palindrome. Most programs cannot be self-applied because it would breach typing. But here are a few, readers might like to see what they do:
        []
        [reverse 42]
        [42 pop]
        [pop]
        [dup cons]

In {Smullyan61} theory of formal systems of (character) strings, the norm or diagonalisation of a string X is defined to be X followed by its own Gödel number in dyadic notation. In Joy the detour via Gödel number is eliminated by using quotations.

We now consider two reductions. The first is this:

        [P]  dup  i
    ==  [P]  [P]  i                     (by dup)
    ==  [P]   P                         (by i)
The second is this:
        [P]  dup  cons  i
    ==  [P]  [P]  cons  i               (by dup)
    ==  [[P]  P]  i                     (by cons)
    ==  [P]  P                          (by i)
So the two reductions come to the same, and hence their first lines are identical:
        [P]  dup  i   ==   [P]  dup  cons  i
But this holds for all [P]. So we may infer
        dup  cons  i   ==   dup  i

The fixpoint theorem and the y combinator

An important result in computability is the recursion theorem, see for example {Rogers67}. It states:
For every function P there is a Gödel number of a function Q such that the result of applying the function P to the Gödel number of Q is the Gödel number of a function identical with Q.
The theorem says that for every function P which is used to edit code for computing functions, there is a program Q which will compute the same function before and after the editing. So Q is a fixpoint, a program not affected by P. Fixpoint theorems are about programs P (rather than about the functions which they compute).

Here is an equivalent version for Joy:

        For any program P
            there is a program Q such that
                [Q] i   ==   Q   ==   [Q] P
The required program Q is
        [dup cons P] dup cons P
The proof is as follows:
        [            Q           ]  i
    ==  [[dup cons P]  dup cons P]  i                   (def Q)
    ==   [dup cons P]  dup cons P                       (by i)
    ==   [dup cons P] [dup cons P] cons P               (by dup)
    ==  [[dup cons P]  dup cons P]  P                   (by cons)
    ==  [            Q           ]  P                   (def Q)

The theorem may be strengthened. The transformation from P to Q can be done by a simple program O:

        There is a program O such that
            for any program P
                there is a program Q such that
                    [P] O   ==   [Q]   and
                    [Q] i   ==   Q   ==   [Q] P
The required program O is
        [dup cons]  swap  concat  dup  cons
The proof is as follows:
        [P] O
    ==  [P]  [dup cons]  swap  concat  dup  cons        (def O)
    ==  [dup cons]  [P]        concat  dup  cons        (by swap)
    ==  [dup cons P]                   dup  cons        (by concat)
    ==  [dup cons P]  [dup cons P]          cons        (by dup)
    ==  [[dup cons P] dup cons P]                       (by cons)
    ==  [Q]                                             (def Q)
The fixpoint finding program O is useful: henceforth it will be called the fix operator. We also define the y combinator for Joy:
        y   ==   fix  i
This is the Joy counterpart of Curry's "paradoxical" Y combinator (uppercase), see {Curry58}. This combinator is well known in the literature on the lambda calculus and on combinatory logic.

The recursion theorem traces its ancestry to Epimenides, Russell and Grelling (for [P] put [i not], and to Gödel. In its general form it is due to Kleene. Its proof is very cumbersome in just about all formalisms. As can be seen from the above, in Joy the proof is very simple.

The S-m-n theorem is the basis for partial evaluation or program specialisation. A special case for this is the partial evaluator mix which can transform an interpreter into a compiler, and can transform itself into a compiler generator. {Jones92} shows that such transformations can be done realistically only if the partial evaluator has primitives in terms of which the fixpoint constructions of the recursion theorem can be implemented efficiently. The two primitives chosen are in fact very close to Joy's i combinator and quotation.

The recursion theorem leads in a few steps to Rice's theorem, see {Rogers67}, which encapsulates all the bad news of computability theory: for example the halting problem, or the impossibility of writing programs which check other programs - implementation, student exercises - for correctness.

Some simple cases

The theorem, in both forms, speaks of all programs P. It is of some interest to see what happens when some simple actual programs are chosen. Two particularly simple programs are
  1. the empty program, which computes the identity function, and
  2. the i combinator program, which dequotes and executes a program on top of the stack.

  1. A frequently paraded corollary of the recursion theorem is the existence of self-reproducing programs. When run, these program produce a replica of themselves. A suitably general definition of self-reproducing programs is this: a program S is self-reproducing if this conditions holds:
            S  i   ==   S
    
    Note that S is allowed to be arbitrarily complex, it does not have to be just a quoted program. Such programs are obtained from the recursion theorem by making P compute the identity function. In Joy this is represented most simply by the empty program, in quoted form []. (Alternatively, it is represented by the program id, in quoted form [id].)
            There is a program Q such that
                [Q] i   ==   [Q]
    
    Proof: let Q be the program
                [dup cons] dup cons
    
    Then the derivation is as follows:
            [[dup cons]  dup cons]  i
        ==   [dup cons]  dup cons                           (by i)
        ==   [dup cons] [dup cons] cons                     (by dup)
        ==  [[dup cons]  dup cons]                          (by cons)
    
    Readers might like to be reminded of a self-replicating C-program:
    
    p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
    
    (Author unknown. Proving correctness is not easy.)

    Related to self-reproducing programs are self-describing programs. A program S is self-describing if running it produces a description of it. In Joy a program S is self-describing if this condition holds:

            S   ==   [S]
    
    Here is a self-describing program:
            [dup cons] dup cons
        ==  [dup cons] [dup cons] cons                      (by dup)
        ==  [[dup cons] dup cons]                           (by cons)
    

  2. The other simple program to investigate is the i combinator. Then [P] is [i], and [Q] is [[dup cons i] dup cons i]. This is what happens:
            [[dup cons i]  dup cons i]  i
        ==   [dup cons i]  dup cons i                       (by i)
        ==   [dup cons i] [dup cons i]  cons i              (by dup)
        ==  [[dup cons i]  dup cons i]  i                   (by cons)
    
    So this is a program that runs forever. This is the Joy counterpart of what in the lambda calculus is selfapplication selfapplied. In both the reductions do not terminate. In the lambda calculus the rule used is beta-reduction, the substitution of actual for formal parameters. In Joy there are no formal parameters, just algebraic simplification.

    The dup cons combination occurs so frequently that it is worth introducing an operator duco defined by

            duco   ==   dup  cons
    
    This will make proofs shorter both horizontally (because programs are shorter) and vertically (because two steps are condensed into one).

  3. The next kind of program is with [P] = dup. Then [Q] is the quotation in the first line below, and execution is as follows:
            [[duco dup] duco dup]  i
        ==   [duco dup] duco dup                            (by i)
        ==  [[duco dup] duco dup] dup                       (by duco)
        ==  [[duco dup] duco dup] [[duco dup] duco dup]     (by dup)
    
    So this is an example of a program which when run produces two copies of itself. The program in the second line is another self-describing program, it produces two copies of itself.

  4. The next program uses [P] = dup i. Then execution starts like this:
            [[duco dup i] duco dup i]  i
        ==   [duco dup i] duco dup i                                (by i)
        ==  [[duco dup i] duco dup i] dup i                         (by duco)
        ==  [[duco dup i] duco dup i] [[duco dup i] duco dup i] i   (by dup)
    
    So this is another program that runs forever. In addition it leaves earlier copies of itself on the stack. In an implementation it must run out of stack space. The program in the second line is again self-describing, but it never finishes what would be a description of infinitely many copies of itself.

    Another program that does much the same is

            [[duco dup dip] duco dup dip]
    

  5. In the introduction a program was given which computes the factorial function without being defined recursively. Here are the first few steps in the execution of that program, those that are independent of the numeric parameter that is provided to the program. To save space, the duco operator is used.
            [ [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
            y
        ==  [ [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
            fix  i                                                  (by y)
        ==  [ [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
            [duco] swap concat duco  i                              (def fix)
        ==  [duco] [ [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
            concat duco  i                                          (by swap)
        ==  [ duco [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
            duco  i                                                 (by concat)
        ==  [[ duco [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
               duco [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
            i                                                       (by duco)
        ==  [ duco [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
            duco [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte     (by i)
        ==  [[ duco [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
               duco [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
            [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte          (by duco)
    
    At this point the numeric parameter for the computation will be needed. The ifte combinator will execute its if-part. That results in the long two-line quotation being popped off the stack and the parameter being compared with 0. If that evaluates to true, the then-part is executed with the parameters and the two-line quotation still on top of the stack, The then-part pops both and replaces them by 1. But if the if-part returns false, the else-part is executed, again with the parameter and the two-line quotation on top of the stack.

    First the two-line quotation is set aside by dip, the parameter is duplicated and the top copy decremented. Only then the two-line quotation will be executed by i. The effect is to construct a clone of itself with duco, pushing the same if-part, then-part and else-part for the contained ifte. When that eventually returns, the old parameter and above it the factorial of the decremented parameter copy are finally multiplied to give the required factorial of the original parameter.

Evaluation order

Most theoretical and practical programming languages use formal parameters in definitions of functions (and procedures). These functions are called with expressions as the actual parameters. There are two ways of doing this: normal order, also known as call by name, and applicative order, also known as call by value. In normal order the unevaluated expressions are substituted for the formal parameters, and the expressions will be evaluated only if the body of the function requires it.

In applicative order the expressions are evaluated first and that value is substituted for the formal parameter. If the value of a parameter expression is used repeatedly in the body of the function, then normal order requires repeated evaluation of the expression, whereas applicative order requires only one evaluation. On the other hand, if the body does not require the value at all, then applicative order will have wasted time trying to evaluate the expression. If that evaluation does not terminate at all, then the call of the function will fail to terminate under applicative order although it would have terminated under normal order. Consequently there are some functions that are less defined under applicative order evaluation than they are under normal order.

The Y combinator of the lambda calculus does not work at all for applicative order. No matter what the function is, the call with the Y combinator will try to do the unending sequence of substitutions and hence fail to terminate. Therefore a different version of the Y combinator has to be used, see {Stoy77}, {Henson87}, {Paulson92}, {Winskel93}.

In Joy there are no formal parameters, there is no substitution, and hence strictly speaking there is no evaluation order. Joy functions take their actual parameters as values from the stack, and in a way this resembles applicative order. The y combinator of Joy always terminates correctly provided that the quoted program and any of its actual parameters did get onto the stack, and provided that the quoted program terminates. So there is a difference between the Y combinator of the lambda calculus under applicative order, and the y combinator of Joy. What is the reason? Consider again, for an arbitrary program [P]

        [P]  y
    ==  [P]  fix  i
    ==  [[duco P] duco P]  i
    ==  [duco P]  duco P
    ==  [[duco P] duco P]  P
This point is always reached in the initial call, and it is independent of what P actually is. At this point there is a (double) quotation (containing P twice) on top of the stack, and P now has it available as a parameter. Quotations are never evaluated further, although they can be explicitly manipulated by operators or they can be explicitly called by combinators. Consequently the second duco inside the quotation will not be executed to yield
    ==  [[[duco P] duco P] duco P]  P
Here of course the second and third duco would have been the next candidates for execution, and so on. So the reason for the difference is that quotations in Joy are never evaluated automatically, whereas abstractions in the lambda calculus will be under applicative order evaluation.

It is possible to generalise fixpoint combinators for mutual recursion. For the lambda calculus this is done for example by {Kogge91}, shows how pairs of mutually recursive definitions can be eliminated by using a pair of rather complicated mutually recursive combinators Y1 and Y2. A similar technique is explained in {Henson87}. Constructions such as these rely on the existence of double fixpoints, whose existence follows from a double recursion theorem (see for example {Smullyan94}). Presumably these can be translated into Joy, too.

Rice's theorem

Consider arbitrary sets of functions, and note that this does not mean sets of programs. Call such a set non-trivial if it is neither the universal set of all functions nor the null set of no functions. Given a non-trivial set F of partial functions and a Joy program [Q], the question arises whether the function computed by [Q] is in F or not. Is there an algorithm for deciding such questions? More specifically, is there a Joy program, say PF, for deciding such questions? Such a program would have to satisfy
    [Q]  PF   ==   true,  if the function computed by [Q] is in F
              ==   false, if the function computed by [Q] is not in F
The program PF is also expected to terminate for all inputs [Q]. Can there be such a program?

No. This is Rice's theorem:

        For all non-trivial sets F of partial functions,
            there is no program  PF  such that
                for all programs  Q
                    PF can decide whether
                        the function computed by  Q  is in F.
Proof: Let F be a non-trivial set of functions. Since F is non-trivial, there are some functions in F and some functions not in F. Let program [E] compute a function in F and let [E'] compute a function not in F.

Now assume that a program PF exists, and that it always terminates. Next, consider the following Joy program:

        [PF]  [pop [E']]  [pop [E]]  ifte
The program expects an arbitrary program [Q] on top of the stack and then pushes the three small quotations. The ifte operator then removes them again and executes the first short quotation [PF]. This will result in a truth value, true or false. In the first case the second short quotation is executed, it pops the [PF] and replaces it by [E']. In the second case the third short quotation is executed, it pops the [PF] and replaces it by [E]. So, if [Q] is in F, the program returns a program not in F, namely [E']. On the other hand, if [Q] is not in F, the program returns a program that is in F, namely [E]. In other words, no matter what the input program [Q] is, the output program is opposite as far as membership in F is concerned. Call the above program OPF.

By the fixpoint theorem, the program OPF must have a fixpoint, say FIXOPF, satisfying

        FIXOPF  ==  [FIXOPF]  OPF
But if the program on the left computes a function in F, then the program on the right computes a function not in F, and vice versa. But this is a contradiction. So the program OPF fails for at least one program, its own fixpoint FIXOPF. We must conclude that the assumption was false. So there cannot be a program PF.

The above proof of Rice's theorem for Joy is adapted from a proof for recursive functions in {Phillips92}. Several well-known paradoxes are instances of the recursion theorem, for example the Liar paradox and Grelling's paradox use as the program [P] the simple program [i not]. Recent discussions of the Liar and related problems can be found in {Martin70} and in {Barwise-Etchemendy87}.

Other self-reproducing and self-describing programs

There are variants of the programs in section 5 which are worth mentioning.
  1. The following is a simplification of the first self-reproducing program. It is simpler because it uses an operand parameter which it leaves intact.
            There is an operand Q1 and a program Q2 such that
                Q1  [Q2]  i   ==   Q1  [Q2]
    
    Proof: Let Q1 = [dup] and let Q2 = dup. Then the derivation is
            [dup]  [dup]  i
        ==  [dup]   dup                             (by i)
        ==  [dup]  [dup]                            (by dup)
    
    This seemingly trivial self-reproducing program will be used to derive a version of recursion that is actually more efficient than the one based on the y combinator.

  2. The next uses the b combinator:
            There are programs Q1 and Q2 such that
                [Q1] [Q2] b   ==   [Q1] [Q2]
    
    Proof: let Q1 and Q2 be the programs
            [[] cons dup first]
             [] cons dup first
    
    Then the derivation is:
            [[[] cons dup first]]  [[] cons dup first]  b
        ==   [[] cons dup first]    [] cons dup first           (by b)
        ==  [[[] cons dup first]]           dup first           (by cons)
        ==  [[[] cons dup first]] [[[] cons dup first]] first   (by dup)
        ==  [[[] cons dup first]]  [[] cons dup first]          (by first)
    
    It is useful to introduce an operator codufi by the definition:
            codufi  ==  cons dup first
    
    The operator will be found useful in the next section.

  3. The next example of a self-reproducing program again uses the i combinator:
            There is a program Q such that
                    [Q] i i   ==   [Q]   =/=   [Q] i
    
    Proof: let Q be the program
            [duco [] cons] duco [] cons
    
    The execution now looks like this:
             [[duco [] cons] duco [] cons]   i  i
        ==    [duco [] cons] duco [] cons  i                        (by i)
        ==   [[duco [] cons] duco [] cons] [] cons  i               (by duco)
        ==  [[[duco [] cons] duco [] cons]]  i                      (by cons)
        ==   [[duco [] cons] duco [] cons]                          (by i)
    
    Observe that the quoted programs in the first and fourth line differ by just the extra quoting in line four.

    There is another program with the property. Let Q be the program

            [false [not] infra dup rest cons] [not] infra dup rest cons
    
    The combinator infra expects a program (here [not]) on top of the stack, and below that a quotation (here the first half of the program). It temporarily turns the quotation into the stack and executes the program (here it complements the truth value false or true at the very beginning. An outline of the derivation is:
        [[false [not] infra dup rest cons] [not] infra dup rest cons] i i
        [[true  [not] infra dup rest cons] [not] infra dup rest cons] i
        [[false [not] infra dup rest cons] [not] infra dup rest cons]
    
    (Each by 5 steps)

    The quoted programs the first and second lines are examples of mutually describing programs, satisfying

            P  =  [Q]       and       Q  == [P]
    
    In detail:
    P  ==  [false [not] infra dup rest cons] [not] infra dup rest cons
    Q  ==  [true  [not] infra dup rest cons] [not] infra dup rest cons
    
  4. The previous program keeps an internal toggle (true or false) which is thrown every time it is called. The next program does the same with a counter, and consequently every generation is different from all previous ones: Let Q be the program
        [0 [1 +] infra dup rest cons] [1 +] infra dup rest cons
    
    Successive executions using i cause the 0 to be incremented to 1, 2 and so on.

  5. Almost all programs become ruined when maltreated. Cutting off bits and pieces would generally cause malfunction. Worms of course can regenerate from small pieces, but most programs are not like that. But programs can be written so that after each call they become more and more insensitive to mutilation. The mutilating operations are first and rest or sequences of such operations. There are programs which with every generation become less and less sensitive to longer and longer sequences of mutilations. Two such Q are
            [duco duco] duco duco
            [cons duco] [cons duco] cons duco
    

  6. Kym Horsell commented that self-reproducing programs do not do anything useful. The idea would be that for a given program another program is to be found which is self-reproducing and at each generation executes the original program first. So the problem is to find, for arbitrary P, a program Q such that
            [Q]  i  i  ..  i   ==   P  P  ..  P  [Q]
    
    where the i on the left and the P on the right are repeated the same number of times. Here is one answer:
            There is a program O such that
                for all programs P
                    there is a program Q such that
                            [P]  O   ==   [Q]   and
                            [Q]  i   ==   Q   ==   P  [Q]
    
    Proof: Let O be the program
            [dup [first i] dip rest duco] cons duco
    
    Then for any P:
        [P]  O
    ==  [P] [dup [first i] dip rest duco] cons duco         (by O)
    ==  [[P] dup [first i] dip rest duco] duco              (by cons)
    == [[[P] dup [first i] dip rest duco] [P] dup [first i] dip rest duco]
                                                            (by duco)
    

Another recursion combinator

Apart from the y combinator there are other recursion combinators. One of them is given by the following theorem:
        There is a program M such that
(1)         [y] M  ==  [y] i  ==  y  and
            for some programs N and O
(2a)             [M] [duco] swap concat  ==  [N]  and
(2b)             [N] duco  ==  [O] and
(3)              [M] y  ==  [N] dup i  ==  O  and
                 for all programs P
(4)                  [P] O  ==  [P] y  and
                     there is a program Q such that
(5)                     [P] [O] cons  ==  [Q]  and
(6)                     [Q] i  ==  Q  ==  [Q] P
The first line, (1), says that y is a fixpoint for M. Lines (2a) and (2b) show how to construct two further programs N and O. Line (3) expresses a relationship between the three programs M, N and O. Line (5) shows how to construct a program Q which depends on P. The last line of the theorem says that the Q is a fixpoint for P.

Proof: Only M need be given, because N and O are constructed. The required M is actually a combinator already seen in section 2:

        cons  dup  first  i
But the first three operators can be replaced by codufi defined in the previous section. So we set:
        M  ==  codufi i
This is used to construct N:
        [M] [duco] swap concat
    ==  [codufi i] [duco] swap concat                           (def M)
    ==  [duco] [codufi i]                                       (by swap)
    ==  [duco codufi i]                                         (by concat)
    ==  [N]                                                     (def N)
So we have:
        N  ==  duco codufi i
This program can be used to construct O:
        [N] duco
    ==  [duco codufi i] duco                                    (def N)
    ==  [[duco codufi i] duco codufi i]                         (by duco)
    ==  [O]                                                     (def O)
So we have:
        O  ==  [duco codufi i] duco codufi i
This program can now be used to construct, for arbitrary [P] a corresponding [Q]:
        [P] [O] cons
    ==  [P] [[duco codufi i] duco codufi i] cons                (def O)
    ==  [[P] [duco codufi i] duco codufi i]                     (by cons)
    ==  [Q]                                                     (def Q)
So we have, for arbitrary [P],
        Q  ==  [P] [duco codufi i] duco codufi i
It remains to be shown that Q is a fixpoint for P:
        [Q]  i
    ==  [[P] [duco codufi i] duco codufi i]  i                  (def Q)
    ==   [P] [duco codufi i] duco codufi i                      (by i)
    ==  [P] [[duco codufi i] duco codufi i] codufi i            (by duco)
    ==  [[P] [duco codufi i] duco codufi i] [P] i               (by codufi)
    ==  [[P] [duco codufi i] duco codufi i]  P                  (by i)
    ==  [Q]  P                                                  (def Q)
This shows that Q is indeed a fixpoint for P.

Different recursion combinators, and indeed a whole hierarchy of them, are well known in the literature on lambda calculus and combinatory logic; see for example {Barendregt84}, {Henson87}, {Revesz88}. Possibly one of the most satisfying introductions to combinatory logic is to be found in the remarkable little book {Smullyan90} To Mock a Mockingbird in which he manages to combine humour and rigour. Part III is a self-contained exposition to combinatory logic in which fancyful names are given to lambda calculus combinators.

Practical recursion combinators in Joy

Here are three recursive definitions. The function r-last finds the last element of a list. The function r-fac computes the factorial of a number. The function r-fib computes the Fibonacci value of a number.
    r-last  ==  [rest null] [first] [rest r-last] ifte
    r-fac  ==  [0 =] [succ] [dup pred r-fac *] ifte
    r-fib  ==  [small] [pop 1] [pred dup pred [r-fib] app2 +] ifte
The following three functions also compute The last, the factorial and the Fibonacci of their parameter. Note that there is no definition; the recursion is taken care of by the y combinator.
  [ [pop rest null] [pop first] [[rest] dip i] ifte ] y
  [ [pop 0 =] [pop succ] [[dup pred] dip i *] ifte ] y
  [ [pop small] [pop pop 1] [[pred dup pred] dip app2 +] ifte] y

But the y combinator is intrisically inefficient because of the way it operates. On every recursive call a certain program is popped off the stack to be executed. The first task of that program is to construct a copy of itself, in readiness for any further recursive calls. But this is really quite silly. It would be better more efficient if the program to be executed was not popped off the stack at all but simply left there. Whereas most combinators remove their parameters from the stack, a new x combinator leaves it there as a parameter for itself.

The following programs use the x combinator instead of the y combinator. They are obtained from the first two of the previous programs by replacing the internal occurrences of the i combinator and the the external occurrence of the y combinator by the x combinator.

  [ [pop rest null] [pop first] [[rest] dip x] ifte ] x
  [ [pop 0 =] [pop succ] [[dup pred] dip x *] ifte ] x
The x combinator might have been defined by
        x  ==  dup i

Similar lambda calculus constructions are discussed in {Tennent76}, {Bauer-Woessner82}, {Schmidt86} and {Tennent91}.

The remainder of this section describes some further general and particular combinators of Joy which can be used to avoid recursive definitions.

  1. The genrec combinator takes four program parameters in addition to whatever data parameters it needs. Fourth from the top is an if-part, followed by a then-part. If the if-part yields true, then the then-part is executed and the combinator terminates. The other two parameters are the rec1-part and the rec2part. If the if-part yields false, the rec1-part is executed. Following that the four program parameters and the combinator are again pushed onto the stack bundled up in a quoted form. Then the rec2-part is executed, where it will find the bundled form. Typically it will then execute the bundled form, either with i or with app2, or some other combinator.
        g-fac == [null ] [succ] [dup pred     ] [i *   ] genrec;
        g-fib == [small] [    ] [pred dup pred] [app2 +] genrec;
    

  2. The linrec combinator also takes four program parameters and is otherwise very similar to the genrec combinator. The essential difference is that the bundled up quotation is immediately called before the rec2-part. Consequently it can only be used for linear recursion. Here are programs for finding the last of an aggregate or the Fibonacci value of a natural number:
        l-last == [rest null] [first] [rest    ] [ ] linrec;
        l-fac  == [null     ] [succ ] [dup pred] [*] linrec
    

  3. The binrec combinator is again similar, but it applies the bundled quotation twice, once to each of the two top values which the rec1-part has left on the stack. It implements binary recursion. Below it is used for the Fibonacci function and for a quicksort program for lists (or strings).
      b-fib == [small] [] [pred dup pred   ] [+                 ] binrec;
    b-qsort == [small] [] [uncons [>] split] [swap23 cons concat] binrec;
    

  4. The tailrec combinator is similar to the linrec combinator, except that it does not have a rec2-part. It can only be used for tail recursion, such as in the program below which returns the last element of an aggregate.
        t-last == [rest null] [first] [rest] tailrec;
    

  5. The primrec combinator is for primitive recursion, it has the typical if-part built in. For numeric parameters and for aggregate parameters it tests for null. The rec1-part is also built in, for numeric parameters it returns the parameter and its predecessor, for aggregate parameters it returns the first of the aggregate and the rest of the aggregate. Recursion then occurs respectively on the predecessor or the rest, and the rudimentary rec2-part typically combines the results.

    The first program below computes the factorial (again!). The second turns any aggregate into a list. The third turns a suitable list of small numbers into a set. The fourth and fifth capitalise lists or strings of lowercase letters and produce, respectively, a list or string of corresponding capital letters. The last program takes an aggregate as parameter and produces a list of every second element in the original order and then the other elements in the reverse order.

        p-fac      ==  [1 ]  [*              ]  primrec
        agg2list   ==  [[]]  [cons           ]  primrec
        list2set   ==  [{}]  [cons           ]  primrec
        capstring  ==  [""]  [[32 -] dip cons]  primrec
        caplist    ==  [[]]  [[32 -] dip cons]  primrec
        fancy      ==  [[]]  [reverse  cons  ]  primrec