Tuesday, April 30, 2013

Classical Fictions

It seems there is a fundamental tension between exploiting the full power of a constructive type theory and upholding the fictions of classical mathematics.

—Noam Zeilberger

The desire not to be anti-classical puzzles me, because in my opinion the law of the excluded middle isn't just unprovable in automated proof assistants, it is really wrong; The statement (in Agda syntax):

∀ {p : Set} → p ⊎ (¬ p)

is usually read "for all statements p, p is true or p is false" but a more faithful translation (to the way the language actually behaves) would be "There exists a program, that, when given any statement, either proves that it is true or proves that it is false." Such a program is famously impossible.

But more than that, even if we could invoke the hand of God and introduce such a program as a postulate, we would still have an impossibility, since Gödel showed that there must be statements that can neither be proven true nor false -- in other words, the constructed law of the excluded middle is impossible even to God. Such a postulate isn't just wrong, it is one of the most widely known wrong things in math.

So it seems to me, that any language that is not anticlassical must be missing the benefit of a few assumptions. I would like to see

¬ (∀ {p : Set} → p ⊎ (¬ p))

to be provable, translated as "If you show me a program and claim it can prove or disprove any statement, I will analyze that program and find a case on which it fails."

The implementation of the above may involve pattern matching on functions (ie, inspecting to see what parts it is built of (lambda, application, constructor, ...)), which would violate extentionality, which I have no problem with because I think extentionality is also obviously untrue. Sure, you may define a

data ExtentionallyEqual {A B : Set} (f : A → B) (g : A → B) : Set where
    ext-equal : (pf : ∀ {a} → f a ≡ g a) → ExtentionallyEqual f g

and reason about it (as you may construct a classical theory within an anticlassical language), but there's no need for all terms in the language to conform, especially when there exist extentionally equal terms with wildly different performance characteristics.

Plus, if you could pattern match on functions, then the compiler could be written as an ordinary function in the source language, in the same execution environment; there would be no need to treat compilation as a "magical" task that happens outside of the main program. And much of the distinction between macros and functions disappears, as all terms are open for inspection. Of course, the inspections would need to respect whatever level of equality the language generally aims to support, and I am thinking but I haven't thought about it too deeply that in Agda that would mean that all pattern matching on functions must only see closed normal forms (which is not unlike how pattern matching already behaves in Agda).

Saturday, April 20, 2013

Axiomatic FRP

OK so there's this thing called FRP

Functional Reactive Programming: Building programs that vary in time out of Signals and Events
Signals, Events: these are objects that you can work with, and you have a set of combinators for making new signals out of old signals, etc,

And existing implementations -- I haven't looked at every existing implementation, but the ones I have looked at -- either don't allow you to create Signals dynamically (ie all your signals are there at program initialization), or have problems with memory management.

and the problems with memory management come because you have Signals implemented in terms of callbacks, which must use weak pointers, so, even though you can't technically overflow the heap this way, you're relying on the underlying garbage collector to stop active behavior -- to stop events from firing.

So I was thinking, we can do better -- one way we can do better, at least -- is if we drop Signals and Event streams for now and focus on one-shot events, because these have a definite expiration attached to them -- once the one-shot has fired you no longer need to hold on to it, so that should improve the memory situation a little bit.

And in addition to that, we can write down a set of axioms that are obviously true of events, so that if we target an implementation to the axioms in a minimal way, we can limit ourselves to only necessary screw-ups, avoiding superfluous screw-ups not required be the axioms.

So, my first attempt goes,

     Events1.agda 

  1  -- https://lists.chalmers.se/pipermail/agda/2010/002484.html
  2  -- https://lists.chalmers.se/pipermail/agda/2010/002485.html
  3  {-# OPTIONS --no-positivity-check #-}
  4  {-# OPTIONS --no-termination-check #-}
  5  {-# OPTIONS --type-in-type #-}
  6  
  7  -- Compiled against standard library 0.6
  8  module Events1 where
  9  
 10  open import Data.Sum using (_⊎_; inj₁; inj₂)
 11  open import Data.Product using (_×_; proj₁; proj₂; _,_)
 12  
 13  record EventSystem : Set where
 14    infixl 5 _map_
 15    field
 16      -- The type of a one-shot event, ie it does not repeat.
 17      Event : (result-type : Set) → Set
 18  
 19      -- Axioms of Events
 20  
 21      -- Basic map.
 22      _map_ : ∀ {r₁ r₂} (ev : Event r₁) (f : r₁ → r₂) → Event r₂
 23  
 24      -- Wait for the latter of one or two events.
 25      chain : ∀ {r} (ev : Event (r ⊎ Event r)) → Event r
 26  
 27      -- For any two events, one of them happens first.
 28      order : ∀ {r₁ r₂} (ev₁ : Event r₁) (ev₂ : Event r₂)
 29            → Event ((r₁ × Event r₂) ⊎ (r₂ × Event r₁))

and I believe -- and you can look at these and tell me if I'm right or not -- but I believe that these are obviously true -- in a single-threaded, non-relativistic world. The order axiom is obviously not true in a multi-threaded world, where in waiting for the first of two events you may miss the second.

So. The first thing I notice, is that we're very close to having a monad, except we're missing pure, and chain aka join has that funny union type.

Well, I had a very good reason for not including pure. When I was writing this I was very worried about having a way to create an event in a side-effectless manner, which is what pure is. I wanted every event to be grounded to another event, so you never had events as pure objects just floating around. I even considered a cojoin-like definition

     main 

  1  instantly : ∀ {r} → Event r → Event (Event r)

except that that turned out not to be very useful for writing actual programs.

But anyway, I remembered after writing this that applicatives don't need pure, and you can see how you can use that same exact trick for monads, so we can forgive ourselves for writing it like this:

  1  record EventSystem : Set where
  2    infixl 5 _map_
  3    field
  4      Event : (result-type : Set) → Set
  5  
  6      instantly : ∀ {r} (x : r) → Event r
  7  
  8      _map_ : ∀ {r₁ r₂} (ev : Event r₁) (f : r₁ → r₂) → Event r₂
  9  
 10      chain : ∀ {r} (ev : Event (Event r)) → Event r
 11  
 12      order : ∀ {r₁ r₂} (ev₁ : Event r₁) (ev₂ : Event r₂)
 13            → Event ((r₁ × Event r₂) ⊎ (r₂ × Event r₁))

Now, honestly, I like the first rules better. The first rules are obviously true, in my opinion, whereas with the second rules we have to do more self-convincing. But monads are ubiquitous and there's no use in fighting a monad when you have a monad, so we go with the monadic Event, plus the order axiom.

(Though we should remember, if we're ever going to implement these axioms, the fact that we had to use a disjoint union to convince ourselves that instantly is valid is a clue that we will need to special-case constant signals).

Now these axioms, despite what I still insist is there obvious trueness, have a serious problem. Let's say I have two events

  2  x : Event ⊤
  3  y : Event ⊤

and I use the Event axioms to derive

  4  xy = chain (x map (λ _ → y))
  5  yx = chain (y map (λ _ → x))

which intuitively mean "wait for x to happen, then wait for y to happen", and "wait for y to happen, then wait for x to happen." Now one of these must hang. Either x comes first or y comes first, and if x comes first and you wait for y, then wait for x, the second waiting will never occur.

Now I claim that this does not falsify the Event axioms. An event that never returns is not illegitimate -- you could still have an event resulting in ⊤ that never comes to pass in the same way you can have a function returning ⊤ that goes into an infinite loop and never returns. It would result in ⊤ if it ever resulted in anything.

So we have to ask ourselves a question. Is this a problem worth fixing? Because if we're going to fix it, we're going to have to introduce extra restrictions in the Event axioms AND, critically, it will be impossible to make Event a monad, since we used only monad combinators to derive a never-occurring event, so we'd be giving up on a good deal of friendly generality by giving up on the monad.

But if I just think back to how often my interactive programs hang up waiting for events in the wrong order, the answer is "a lot". This is actually a pretty common problem. This is actually something we could stand to benefit from in the Event laws. Here is an attempted fix:

     SafeEvents.agda 

  1  {-# OPTIONS --type-in-type #-}
  2  {-# OPTIONS --no-positivity-check #-}
  3  {-# OPTIONS --no-termination-check #-}
  4  
  5  module SafeEvents where
  6  
  7  open import Data.Sum using (_⊎_)
  8  open import Data.Product using (_×_)
  9  
 10  π : {T : Set} → (T → Set) → Set
 11  π {T} A = (t : T) → A t
 12  
 13  record EventSystem : Set where
 14    field
 15      -- Time is probably not going to be represented by a Double or Int. It's not a "time" so
 16      -- much as a handle for getting at events.
 17      Time : Set
 18  
 19      -- We don't really care what this means.
 20      min : Time → Time → Time
 21  
 22      Event : (A : Time → Set) → (t : Time) → Set
 23  
 24      -- An event that happens without delay.
 25      instantly : ∀ {t A} (x : π A) → Event A t
 26  
 27      -- This is like usual map, but be careful, you also get a time.
 28      _map_ : ∀ {t A B} (ev : Event A t) (f : ∀ {t} → A t → B t) → Event B t
 29  
 30      -- This represents waiting for the second of two events.
 31      chain : ∀ {A t} → Event (Event A) t → Event A t
 32  
 33      -- Total ordering: for any two events, one happens first, and the other second.
 34      order : ∀ {A₁ A₂ t₁ t₂} (ev₁ : Event A₁ t₁) (ev₂ : Event A₂ t₂)
 35            → Event (λ t → (A₁ t × Event A₂ t) ⊎ (A₂ t × Event A₁ t)) (min t₁ t₂)

These axioms, I think, are not so obvious, so they require some explanation.

First, Event now takes a Time parameter in its type:

  6  Event : (A : Time → Set) → (t : Time) → Set

t is not the time at which the event happens; it is some time before the event happens. ie, an Event A t is an event that results in A after time t. So if t₁ comes before t₂, for example (a property not describable by these axioms), an event t₂ could also plausibly be an event t₁ -- though we have no axioms to get at this concept.

Now, instantly is written differently:

  7  instantly : ∀ {t A} (x : π A) → Event A t

The π is hiding some things; this could be written

  8  instantly : ∀ {t A} (x : (t : Time) → A t) → Event A t

In other words, the argument you give to instantly should be a function taking a Time as its argument, and returning a result whose type depends on that time. I say "should" rather than "must" because you can see that this is not actually a restriction on the caller -- x is free to ignore the time and return the same result it would have anyway; this type of instantly is actually more powerful and less general than the old one.

Here is how _map_ turned out:

  9  _map_ : ∀ {t A B} (ev : Event A t) (f : ∀ {t} → A t → B t) → Event B t

There might be a more useful way to write map, one that gave f more access to the times, but I was worried that could create a problem, so I stuck pretty close to the usual _map_ definition.

But chain holds the difference from the old Event laws, and this is more restrictive on the caller:

 10  chain : ∀ {A t} → Event (Event A) t → Event A t

Some of the parameters have been eta-removed, making it look suspiciously monadic, but this can be written as:

 11  chain : ∀ {A t} → Event (λ t′ → Event A t′) t → Event A t

In other words, you can chain an event producing an event provided that the inner event is constructed based on any time. This is giving me some kind of hints that times need types associated with them -- maybe Time actually represents an observable object -- but I'm not sure what to do about that right now.

And lastly, order gives us the kind of useful information needed to call chain:

 12  order : ∀ {A₁ A₂ t₁ t₂} (ev₁ : Event A₁ t₁) (ev₂ : Event A₂ t₂)
 13        → Event (λ t → (A₁ t × Event A₂ t) ⊎ (A₂ t × Event A₁ t)) (min t₁ t₂)

And we can verify that the necessary information is contained in order by writing a second function to get the second of two events:

 14  second : ∀ {r t₁ t₂} → Event r t₁ → Event r t₂ → Event r (min t₁ t₂)
 15  second ev₁ ev₂ = chain (order ev₁ ev₂ map λ {
 16           (inj₁ (r , next)) → next;
 17           (inj₂ (r , next)) → next
 18      })

And that works. And now we have time-safe events.

Wednesday, April 17, 2013

Applicatives do not need `pure`

A few days ago I said:

Usually, applicatives are assigned a third power, which is the power to take a plain object and bring it into the functor in the simplest way possible, eg 2 becomes [ 2 ]. This power is essential to writing generic, recursive programs with applicatives.

I take it back. I now believe that nothing interesting you can do with an applicative functor requires `pure`.

Assumption: if the only practical difference between two programs is that one applies pure at the very end, it is no more interesting than the other.

For example, pure 3 is not interesting, because it doesn't tell you anything that 3 doesn't, and (+) <$> (pure x) <*> (pure y) is not interesting because it is equivalent to pure (x + y).

I had been thinking of ways to represent events in FRP without bringing in memory issues that usually come with FRP. This meant I was spending a lot of time looking at different algebraic structures, and trying to peel off the assumptions that weren't needed. At one point I had a structure like the following:

     UnpureApplicatives.agda 

  1  module UnpureApplicatives where
  2  
  3  open import Level
  4  
  5  -- Here is a definition of applicatives that does not include
  6  -- a `pure` axiom.
  7  
  8  record UnpureApplicative {l} (F : Set l → Set l) : Set (suc l ⊔ suc zero) where
  9    field
 10      -- The basic map axiom.
 11      _ap\_ : ∀ {A B} (f : A → B) (x : F A) → F B
 12  
 13      -- The combine axiom.
 14      _/ap\_ : ∀ {A B} (f : F (A → B)) (x : F A) → F B

which has map and ap (as all applicatives do) but lacks pure, and provides no way of defining pure.

But we can still define something that is just as good as sequence:

     Test1.agda 

  1  open import Level
  2  open import UnpureApplicatives
  3  
  4  module Test1 {l} (F : _) (applicative : UnpureApplicative {l} F) where
  5      open import Data.List using (List; []; _∷_)
  6      open import Data.Sum using (_⊎_; inj₁; inj₂)
  7  
  8      open UnpureApplicative applicative
  9  
 10      -- We can derive this from the other two.
 11      -- Agda auto was actually able to solve this one.
 12      _/ap_ : ∀ {B} {A : Set l} (f : F (A → B)) (x : A) → F B
 13      _/ap_ f x = (λ z → z x) ap\ f
 14  
 15      -- This is how sequence now looks.
 16      -- It's ugly, but it gets the job done.
 17      sequence : ∀ {A} → List (F A) → List A ⊎ F (List A)
 18      sequence [] = inj₁ []
 19      sequence (x ∷ list) with sequence list
 20      ...                     | inj₁ plain = inj₂ ((_∷_ ap\ x) /ap plain)
 21      ...                     | inj₂ appy = inj₂ ((_∷_ ap\ x) /ap\ appy)

and the lack of pure is not "interesting" here because it may be applied at the end, by pattern matching on ⊎.

So what we've really done is use this "unpure" applicative to build a regular "pure" applicative:

     BuildFromUnpure.agda 

  1  open import UnpureApplicatives
  2  
  3  module BuildFromUnpure {l} (F : Set l → Set l) (Unpure : UnpureApplicative F) where
  4    open import Data.Sum using (_⊎_; inj₁; inj₂)
  5    open UnpureApplicative Unpure using (_ap\_; _/ap\_)
  6  
  7    F' : Set l → Set l
  8    F' A = A ⊎ F A
  9  
 10    pure : ∀ {A} → A → F' A
 11    pure = inj₁
 12  
 13    _/ap'\_ : ∀ {A B} (f : F' (A → B)) (x : F' A) → F' B
 14    inj₁ f /ap'\ inj₁ x = inj₁ (f x)
 15    inj₁ f /ap'\ inj₂ x = inj₂ (f ap\ x)
 16    inj₂ f /ap'\ inj₁ x = inj₂ ((λ f → f x) ap\ f)
 17    inj₂ f /ap'\ inj₂ x = inj₂ (f /ap\ x)

And I think it is clear that these two definitions are equally interesting, and, if a definition for pure were supplied originally, it could always be applied at the end, and so would add nothing.

Also, you could take an unpure applicative, make it a regular applicative, then drop the pure, and have something equivalent to the original unpure applicative.

Friday, April 12, 2013

Reduction Engine in Java 1.6, new features

The Reduction Engine is now compiled with Java 1.6 so it should work in more browsers. I haven't tested in other browsers, though, so I can't promise anything.

(The Reduction Engine)

New features:

Normalization: type 'N' to normalize a whole subtree by performing all available reductions in outside-to-inside order (lazy evaluation).
Alt-click on a node to use it to fill the current hole.

Bug fixes:

Bury and collect work more thoroughly.

Example of normalization

ctrl-click to insert a root:

Above is a function that adds one to a number. (Since addition is commutative, we could have written it the other way (1 + _ rather than _ + 1) and it would have been shorter).

Now hit N to reduce to normal form:

This gets us to point-free form, a pure SK representation of the function. Not very friendly to look at; the lambda form is easier for humans. But the computer has no trouble evaluating this:

And there's the answer.

Monday, April 8, 2013

The Reduction Engine

This is a first, rough version of the reduction engine:

The reduction engine on one page

Usage

Inserting:
- Ctrl+Click makes a new tree.
- +
- -
- *
- S
- K
- I
- Y
- . Insert a function application
- 0..9,# A number
Modifying:
- e Partially apply
- C Clear subtree
- i Insert argument
Reducing:
- r Reduce at current node
Applicatives (see explanation below -- basically work like lambdas):
- b Bury
- c Collect
Moving around:
- Down Move to child
- Up Move to parent
- Left Cycle selection left
- Right Cycle selection right
- Click to select a particular node.
Dragging things around visually:
- Shift+Up,Down,Left,Right Move subtree.
- Drag with mouse also moves subtree.
- f Reformat subtree.

The list to the right allows you to jump to past editing states.

General Description

The reduction engine is a small, interactive editing environment for playing around with combinatory expressions as well as applicative expressions. Applicative expressions as they are done in the reduction-engine is one of its unique features, and are described below.

The use of "holes" in the tree is inspired by holes in Agda.

Applicative expressions generalize lambda expressions

How to think about functors and applicative functors

A description of applicative functors in Haskell

Conor McBride, Ross Paterson. Idioms: applicative programming with effects

Functors

A functor is like a way of operating on data that has been wrapped up inside another abstraction. That is, it takes ordinary operations on plain data, and makes it so they can operate on the imaginary data inside the abstraction.

The List functor is a good example: it gives a way of operating on elements inside a list. Say f is the operation "add 2 to a number", and we have a list

[ 1, 2, 3 ]

The list functor gives us a way to take f and make it work on the elements of the list, giving

[ 3, 4, 5 ]

Another example is the function functor. If I have a function

g  =  x -> x * 7

and I want to apply f to this, the function functor says that the way we do this is to move the abstraction (x ->) one place further to the outside, giving us

x  ->  (x * 7) + 2

That's all for functors: the power to operate on data inside another abstraction. Applicatives will add another power.

Applicatives

Applicatives have 2 powers:

Operate on data inside the abstraction -- the same power as functors have.
Mix abstracted data from 2 sources.

I have omitted one of the powers usually ascribed to Applicatives. More on that below.

Mixing abstracted data from 2 sources

Supposing we have two lists,

[ 1, 2, 3 ]
[ 4, 5, 6 ]

and we would like, Matlab-style, to add them together element-by-element, to get:

[ 5, 7, 9 ]

It's a pretty common thing to want to do, which is why Matlab makes it happen all the time. Well the problem is, with just power #1, we can't bring two lists together. So power #2 says, you can take an operation of two things (like add two numbers together), and provide for it as arguments two functor-embedded pieces of data.

So now, we have a way to operate on one functor-embedded object, and two at a time.

Now if you imagine an expression

       f
      / \
    [x]  g
        / \
       y  [z]

where [x] and [z] are inside the functor, and y is a plain, unfunctored object, and f and g are some operations, and say we would like to evaluate it. So we apply power #1 to evaluate g(y, z), which ends up being in the functor, so f has 2 children in the function so we apply power #2 to evaluate f(x, ...), and there we go. And if every expression you could ever want to evaluate can be represented as a binary tree, then just applying powers #1 and #2 you can evaluate any applicative expression regardless of how polluted with functors it is -- provided we stick on only one kind of functor (and no nesting).

How did the data get inside the functor in the first place?

Usually, applicatives are assigned a third power, which is the power to take a plain object and bring it into the functor in the simplest way possible, eg 2 becomes [ 2 ]. This power is essential to writing generic, recursive programs with applicatives, but it is a little superfluous in the context of the Reduction Engine, because every functor, applicative or not, ought to have some way of getting data inside it, otherwise you wouldn't be trying to operate with that functor in the first place.

So what you can imagine is, we have these big expression trees,

and somehow, despite our best efforts to keep them out, functors manage to slip in there -- maybe as subtly as the risk of finding a zero when dividing -- but, when we have the applicative powers #1 and #2, we can go on evaluating the expression as if nothing ever happened, and only at the end, when we want to look at what we finally got, do we see that there was a functor involved in there somewhere.

Applicatives in the Reduction Engine

The preceding discussion sort of gives the impression that the applicative powers can be applied "automatically" to the tree, Matlab-style, without the programmer having to type them in. In reality, no programming language supports that fully, though some have varying degrees of syntactic sugar that take a bite into that problem. Unfortunately, there is actually a good reason why programming languages don't do this, though that reason is not quite as good as people usually believe.

The reason is basically that sometimes human judgment is required. The applicative combinators can't be applied automatically because it is not always automatically clear what the programmer actually meant. If I write

    f
   / \
 [x] [y]

where [x] and [y] are trapped in a functor, let's say the same functor, your instinct is to apply power #2, but on reflection you see that there are actually 6 different things the programmer might have meant:

Apply power #2, mixing the two functors.
Apply power #1, treating [x] as inside a functor but treating [y] as an aggregate object (think [1, 3] + length(ls))
Treating [x] as aggregate and [y] as inside.
Treating both [x] and [y] as aggregate.
Treating [x] as inside, [y] as aggregate initially, until you get inside [x], then treat [y] as inside (think f([1,2], [3, 4]) = [[f(1,3),f(1,4)],[f(2,3), f(2, 4)]])
Treating [y] as inside, [x] as aggregate until getting inside [y], then treating as inside.

That is five too many for the computer to pick one! Let us say, to simplify the matter, that f is not capable of operating on aggregates, which eliminates 2, 3 and 4 but leaves 1, 5 and 6. You can think of the difference between them as the difference between "lining up" the functors, and two kinds of overlapping:

x -- y    Lining up

x --
  -- y    x surrounds y

  -- y    y surrounds x
x --

In general, if we have an n-ary meeting of arguments (f(x1, x2, ...)) each x_i of which is wrapped in k_i functors, which we assume to be all amicable:

[ ]   [ ]          [ ]
[ ]   [ ]          [ ]
...   ...    ...   ...
 x1    x2           xn

(here [ ] represents some functor that one of the xs is wrapped in, symbolized by it being "on top"), the possible meanings correspond to different ways to draw non-overlapping horizontal lines through the functors

where every functor is hit by exactly one line, and the lines may "miss" functors so long as they do not cross. That is quite a lot of ways.

The idea behind the Reduction Engine -- and I'm very tired now so I'll have to save where exactly that idea comes from for another post -- is that you can solve most of those problems by making the functors artificially incompatible. Let us say we have

   f
  / \
[x] [y]

where [x] is embedded in the α functor and [y] is embedded in the β functor, and say that α and β are incompatible in such a way that we cannot perform power #2 on them. Then the only things we could end up with are an α β or a β α -- choices 5 and 6. So that's a little better.

Then, if the functors are power-2 compatible, we say, well, the programmer could have made these incompatible if they wanted to, so, since they didn't, we'll assume that they wanted to line them up, leaving only one possibility.

So the problem is almost solved. The only thing left is we need a way to decide between α β and β α. They way the reduction engine addresses this is something that makes no conceptual sense, but, having thought about it, I have decided makes exactly as much sense as lambda expressions, which, this has led me to believe, also make no conceptual sense. But they are very useful.

The solution is that in the Reduction Engine, after applying an applicative expression like f(x, y), you have a series of "recollect" steps:

  Cβ
   |
  Cα
   |
   f
  / \
[x] [y]

(C stands for "collect"). Then, on the argument side, you balance the recollect steps with some "bury" steps.

  Cβ
   |
  Cα
   |
   f
  / \
Bα  Bβ
 |   |
[x] [y]

What you are doing with the "bury" steps is marking certain arguments as "going inside" the functor. This is how we eliminated options 2 3 and 4 from the big list above. When you say Bα(x), you are saying, let's treat x as belonging to the α functor, and only the α functor.

What's going on in the recollect steps is less clear. Nominally, a recollect step "unmarks" the mark that the bury step imposed. eg, if you wrote

Cα
 |
Bα
 |
[x]

you are marking and the immediately unmarking x, so this is the same as just x. But the recollect steps are playing another crucial role, and that is that the order of recollect steps is deciding the order of overlap of the functors. If you collect α first and then β, you are asking for a β α, and if you do it the other way around, you are asking for an α β. Ambiguity solved.

The reason I believe that this makes no conceptual sense is that if you cover up the recollect steps with your hand for a moment and just look at what lies below them,

   f
  / \
Bα  Bβ
 |   |
[x] [y]

you will see that, even though you have written the complete application of f, it is as of yet totally unclear what is actually going to happen at that junction. Is this tree an α β or a β α? Consider that in a typed language like Haskell, you would have a hard time indeed typing this expression, not knowing which functor was on the outside and which was on the inside.

If you don't think that that is odd enough, consider a deeper tree where the bury and recollect steps are far apart:

      Cβ 
       | 
      Cα 
       |
       h
      / \
     g   w
    / \
   f   z
  / \
Bα  Bβ
 |   |
[x] [y]

Now the actual computations that happen at the f junction -- and, in a typed language, its type -- are determined by something far up the tree. What is going on?

If you think about this for awhile I hope you can convince yourself that this is exactly the mechanics of lambda expressions. Collect is lambda, bury is a variable, "artificially making functors incompatible" is alpha conversion. The only odd part is that variables have something subordinate to them in the expression tree; other than that, it is literally exactly the same.

That also sort of resolves the typing issue. We can think of bury as if it literally stripped off the functor type, making it totally invisible. In that interpretation, at the f junction, f has its normal, unfunctored typed, so the type is not, as we first feared, determined by something far up the tree. This is the same mechanics as open terms in lambda expressions -- we don't say that in \x -> x + 2 that x + 2 has type Int -> Int just because it has a free variable in it -- we say it is of type Int, oh and by the way it's also an open term. This is just another kind of open term -- an applicative open term.

But it gets better than that. Not only have we made a new notion of open terms, but regular old lambdas are just a special case where the function functor is used. So what we have really done is opened up the sphere of open terms from being tied to just one applicative functor to encompassing all applicative functors.

Conclusion

I would love to see the principles behind the reduction engine in a mainstream programming language, or even an academic one like Agda. One of the things I've always like about Agda is that it has the ability to reason about open terms, so you don't have to write a complete program to play around with some data and evaluate things. Allowing general applicative open terms just makes it better. And, I'll have to think more about it, but I believe that open applicative terms are a useful restriction to Scala's delimited continuations (which are more general), that could simplify writing applicative expressions. These are much, much more flexible than idiom brackets.