Friday, May 31, 2013

Avoiding state machines in an Actor by passing continuations

This example shows the use of continuation-passing style to offload a long-running computation in an Actor without unrolling the code into an unwieldy state machine:

You're probably not supposed to use Actors this way.

It avoids a growing stack because react() clobbers the stack at each call (essentially forcing a tail-call). So for example the following program does not stack overflow:

Wednesday, May 29, 2013

Cooperation

Things I have learned about cooperation:

  • If you tell IntelliJ to add Maven support to a Scala project, it will reorganize your *.scala sources and rename them to *.java sources.
  • If I run XP in a VM, it will alert me when my battery goes low, and its solution is to hibernate itself.
  • If I keep back the (since removed) GHDL package to run my VHDL tests, I must keep back gcc in turn preventing me from compiling drivers for the kernel I have installed.
  • If you tell IE running under wine to make itself your default browser, it will do so, even if it can't fully start without crashing, and become your default PNG viewer.

I think this is a good world.

Tuesday, May 28, 2013

Pushing and pulling

I'm mystified. The following definition of a program is efficient to execute:

sealed trait Program[A]
case class Result[A](x: A) extends Program[A]
case class NeedsInput[A](f: String => Program[A]) extends Program[A]

and in fact has a not bad definition of order():

def order[A1,A2](p1: Program[A1], p2: Program[A2]):
        Program[Either[(A1,Program[A2]),(A2,Program[A1])]] =
    p1 match {
      case Result(x) => Result(Left(x, p2))
      case NeedsInput(f) => p2 match {
        case Result(x) => Result(Right(x, p1))
        case NeedsInput(g) =>
          NeedsInput { s =>
            order(f(s), g(s))
          }
      }
    }

but if you define map, flatMap, etc these turn out to be ineffecient (left as an exercise to the reader).

Programs represented in "pull style" (the preceding are essentially "push style") are not so efficient to execute (in the obvious way):

sealed trait Event[A]
case class Instantly[A](x: A) extends Event[A]
case object Input extends Event[String]
case class Computation[A,B](ev: Event[A], f: A => B) extends Event[B]
case class Chaining[A](ev: Event[Event[A]]) extends Event[A]
case class Ordering[A1,A2](ev1: Event[A1], ev2: Event[A2]) extends Event[Either[(A1,Event[A2]),(A2,Event[A1])]]

Wouldn't I love to be able to define

def compile[A](ev: Event[A], cc: ): Program[A] = ???

but I haven't thought how to do that.

What gives? Why is it so hard to achieve both the memory managing benefits of pull programs and the lack of wasted computation of push programs?

This is still puzzling me.

Monday, May 27, 2013

Redo-style signals

Let's say I have signals a, b, c, d with dependencies

Inputs ~> a ~> b ~> c ~> d ~> Real World

The signal 'a' occasionally changes (due to inputs), which in turn changes the valid values of 'b', 'c' and 'd'. The value of 'd' is occasionally read and used to update some real-world object that the user can see. Maybe 'a' is set more often than 'd' is read; maybe it's the other way around; we're not sure.

The question is, when should the computations of 'b' 'c' and 'd' be performed? Should it be done in "push style", ie, every time 'a' changes, or "pull style", every time 'd' is read? In the first case we may waste a lot of computation if 'a' is set more frequently than 'd' is read. In the second, we do not waste computation, but we require a full check of the dependency tree every time we read 'd', just to see if something has changed in the meantime.

Well it turns out that the conceptual solution to this problem was already solved by Avery Pennarun in his program redo:

Dependencies are tracked in a persistent .redo database so that redo can check them later. If a file needs to be rebuilt, it re-executes the whatever.do script and regenerates the dependencies. If a file doesn't need to be rebuilt, redo can calculate that just using its persistent .redo database, without re-running the script. And it can do that check just once right at the start of your project build.

redo is based on the following simple insight: you don't actually care what the dependencies are before you build the target; if the target doesn't exist, you obviously need to build it. Then, the build script itself can provide the dependency information however it wants; unlike in make, you don't need a special dependency syntax at all. You can even declare some of your dependencies after building, which makes C-style autodependencies much simpler.

A signal's value is either up to date or obsolete. When the value is requested, if it is up to date, it returns that value immediately, not checking the dependency tree. If it's obsolete, it does a full recalculation.

Then, when a source value like 'a' changes, it simply asks the question: Who is currently relying on the correct value of this signal? In the case of 'a', that would be 'b', so 'a' notifies 'b' that the value is now obsolete (but without providing the new value), who notifies 'c', who notifies 'd'. After that notification chain is sent, no one is relying on the correct value for anything, so the next time 'a' changes, nothing happens; no notifications are done.

There is another benefit to this system. Say the dependency tree actually looks like

  d
 / \
b   c
 \ /
  a

With push-style signals, every time 'a' changes, 'd' gets updated twice, which is wasteful. With pull-style signals, 'd', would be updated once, but a would be consulted twice every time d is read. Redo-style signals get the benefit of push-style signals without having to consult the whole tree at each update.

Code

Monday, May 6, 2013

Justices Alito and Kennedy on Monads vs Applicatives

From the oral arguments in Alleyne v United States (audio):

JUSTICE ALITO: Now, if you were defending a case involving drug weight and your client maintained that he or she had nothing to do with these drugs, how would you proceed? Your argument would be: They're not my drugs, but if they were my drugs, they weren't -- they didn't weigh more than one kilo.

MS. MAGUIRE: Well, Justice Alito, those are strategical questions that come up in every trial case that we have. ... But those -- those strategic decisions exist whether or not the Court adopts this rule or doesn't adopt the rule.

...

JUSTICE KENNEDY: But -- but isn't it difficult for you to say he had nothing to do with the drugs, plus the drugs didn't weigh more than a certain amount?

MS. MAGUIRE: I don't believe that that is difficult, and I believe that those are decisions that you make in every case. ...

...

JUSTICE KENNEDY: Well, we're not getting very far with this. But one answer you could say is that, in order to preserve the constitutional right, you want us to have a bifurcated trial. I thought you were -- might say that.

MS. MAGUIRE: No, we are not -- we are not asking for a bifurcated trial. We are just asking that if there's one --

JUSTICE KENNEDY: That's good because that's an extra problem.

A bifurcated trial is one in which the jury first decides on a subset of questions, then the trial continues, and the jury later decides the remaining questions.

Alito notices that the trial is set to proceed in applicative style, and sees that the defense will be in an awkward position, since they must prove facts about an incident they claim never took place. Kennedy then offers a bid for a monadic trial, where the result of the first verdict can drive the argument leading to the second. But Ms. Maguire resists, for the same reason Kennedy had in mind: in law as in programming, monadic style brings extra costs.

The cost of a bifurcated trial is that you must interrupt the trial, wait for the jury to deliberate, then wait for the lawyers to adapt their strategies to the verdict, wherein either the lawyers have planned for both outcomes (wasting effort) or the trial is delayed further. Analogously, if you have a monadic query library, every use of >>= aka flatMap:

val X : Column[Bool]
val Y : Column[Int]
val Z : Column[Int]

X flatMap { x =>
    if (x)
        Y
    else
        Z
}

requires the preliminary query to complete, return control and data back to the main program, execute some code, make a new query, send it back to the database again, and resume. And the query optimizer doesn't get the benefit of seeing the whole query at once. This is why LINQ is not strictly monadic (and can operate on expressions, not just higher order functions) and all monadic query languages have some really awkward spots where they try to avoid this problem.

The analogy to speculative planning by the lawyers would be speculative execution of the closure { x => ...} by the client environment, which is theoretically possible but I am aware of no software that actually does this.

(In this case it would seem that even if the court holds that the trial should proceed applicatively, Ms Maguire's client gets the benefit of a monadic trial because he started his appeal after the primary verdict had already been reached; but no other defendant to whom the ruling applies will get this accidental benefit. Perhaps this explains why Ms Maguire seems relatively unconcerned by the issue).