Struggling Through Problems: June 2013

Wednesday, June 26, 2013

I'm sick of tools

Synonym's for "tool" include "framework" and "system". So for example "build tool", "build system", "web framework", "message-passing system", "text processing system", "dependency injection framework" etc.

The problem with a tool is that it takes control away from the client code. make ("Make is a tool", begins the intro) will plan your build from start to finish [1]; sphinx (also a self-described tool) starts in control and relinquishes it through plugins; maven takes usurping control to a whole new level.

I am not bashing the merits of these programs; Sphinx produces beautiful output and there simply isn't a substitute for Maven right now. I am rather noting that to the extent that these programs are infuriating ("Maven is a device for reducing grown men to sobbing masses of absolute terror") it is because they assume too much control over the process flow.

But what mystifies me in retrospect is that I did not always feel this way. I used to feel insecure over my dislike of tools. In the same way that a novice C++ programmer when quickly shot down for their ignorance of sequence points is unlikely to question the philosophical rationale of even having sequence points, I assumed that the reason for my dislike of tools was due to my lack of mastery with tools, my lack of strength in the face of tools, and my lack of knowledge for the design considerations behind tools.

And yet somehow, when I did master a tool, enough to build my own tool around the tool, I still didn't like the tool.

I didn't quite understand why; I thought I had simply bumped against architectural mistakes in docutils; I didn't realize that the problem was that it was a tool, and that by wrapping it in my own tool I was in fact making the situation worse.

The moral became apparent while I was drafting a Makefile to build some code that used to be managed by TI Code Composer Studio (note the URL) and compiled with TI Code Generation Tools. The compiler cl6x for Code Generation Tools has perhaps the most mischievous command line interface I have ever encountered. Its rules for the naming of output files (which sometimes permit explicit overriding and sometimes not), which files must be built in a common invocation, and what must be the working directory are like carefully placed traps, and make has a habit of stepping everywhere.

And then, I decided, I don't have to write a Makefile. Nor must I use make or Scons or CMake or redo or any build tool at all. I can write a Python script that invokes the appropriate commands, and that's it.

And I won't sacrifice incremental build, because I can add memoization, and I won't sacrifice selectable targets because I can use argparse for command line arguments, and I won't sacrifice wildcards and path substitutions because I can write a few Python functions for manipulating paths and that's all I need (os.path does most of the work).

A skilled Makefile writer will point out that the job would have been easy in make if you knew what you were doing. That's probably true. But the lesson to be learned is that we no longer need to feel bad that we didn't know what we were doing with make, we can get the job done in pure Python and nobody need ever know what they are doing, beyond what they are actually in reality doing.

Building on this philosophy I wrote scooter, which I think in retrospect contains some mistakes that bring it a little too close to being a tool (such as using exclusively inotify for dependency tracking), but it is at least a step in the right direction. The project quickfiles that came out of it is better.

As a tool-less conclusion I present you the simplest possible HTML document generator for Python:

  1  def surround_with(start, end):
  2      def dec(f):
  3          print(start)
  4          f()
  5          print(end)
  6      return dec

Because that really is all you need, and does 50% of the work that docutils or Sphinx or LaTeX do, without requiring any new knowledge (everyone has to learn HTML at some point in their lives). Add in this function:

  7  def pyplot_image(**fig_attrs):
  8      from matplotlib.pyplot import figure
  9      def dec(f):
 10          fig = figure(**fig_attrs)
 11          f()
 12          tmp = Path.mktemp().setext('.png')
 13          fig.savefig(str(tmp))
 14          data = b64encode(open(str(tmp)).read())
 15          print('<img src="data:image/png;base64,%s"/>' % (data,))
 16      return dec

And you have everything you need to autogenerate reports of analyses including plots, as a standalone HTML document with no external dependencies, that you can email to someone, that will display in any browser.

[1]	Otherwise `make -n` would be much harder.

Tuesday, June 25, 2013

Math?

TEDxManhattanBeach - John Bennett - Why Math Instruction Is Unnecessary

I don't do math on a typical day. I am an electrical engineer; I work on X-ray spectroscopy systems.

It is true that I do math sometimes, but maybe one day in twenty. It is sometimes quite sophisticated math (probability or signal processing), but it takes no more than half an hour.

I have studied a lot of math. I have studied groups, rings, fields, field extensions, vector spaces, modules, categories, monads, adjunctions, topologies, sets, logic, algorithms, derivatives, integrals, differential forms, complex numbers, matrices, differential equations, and Laplace transforms. But I don't think about those things much anymore. I mostly do a lot of coding.

Though I suppose my job would be harder without math. The 80/20 rule still applies; I do little math because I can get it done quickly and move on to something else. If I couldn't do math it might consume my whole day.

Though I still don't see myself factoring a polynomial.

I never use logging frameworks

Logging frameworks are not respectful of the 80/20 rule. A logging framework will make 80% of your logging needs easier, but they make the extra 20% harder.

That extra 20% consists of cases where you are in doubt as to whether logging messages can get through at all, such as starting a new project, moving to a new platform, misconfiguring the logging framework, long-forgotten bits of code overriding the configuration, layered logging frameworks that delegate to each other, missing write permissions on a log file, etc.

It is a special case of the 80/20 rule that the hardest part of any task is verifying whether an event ever happens at all. Are all my unit tests passing, or are they simply not being run? Did this assertion pass or are assertions disabled? Did the program succeed or was it never invoked? Is it still working or is it waiting for input, or did it hang? Is this logging statement never reached or is logging disabled, or the output redirected, or logging for this particular class disabled, or a filter applied to the log messages, or a dependency missing causing a default to no logging, or a buffer not yet flushed? Does the new version still work, or am I running an old version? Did he not send the email or did my spam filter reject it, or the SMTP server bounce it for being too large? Did she not call because everything was ok or because she was already passed out? Is he cool with me and Ryan last night or is he too pissed for words?

I never use logging frameworks. I debug by printf, and then delete the printf. What if I forget to delete it? That's what github is for. All of my projects are either small and open source, or GUI applications used by non-programmers where spamming stdout is a non-issue.

Monday, June 3, 2013

Proving type equality in a Scala pattern match

Finally got this to work. Here's the infuriating code:

  1  sealed trait Two[A,B]
  2  case class One[A]() extends Two[A,A]
  3  
  4  def convert[A,B](x: A, t: Two[A,B]): B = t match {
  5    case One() => x
  6  }

which of course gives:

type mismatch;
 found   : x.type (with underlying type A)
 required: B
    case One() => x
                  ^

You would think that Scala might introduce a constraint that A =:= B, but apparently not. Of course you could just make the =:= yourself

  7  sealed trait Two[A,B]
  8  case class One[A,B](pf: A =:= B) extends Two[A,B]

except that that is much more cluttered and not how people usually write and use case classes.

It turns out that the working solution is to use a matched (lower-case) type variable:

  9    def convert[A,B](x: A, t: Two[A,B]): B = t match {
 10      case o: One[a] =>
 11        x: a
 12    }

So, the match is not as pretty as it could be, but the case class definition is clean and we never explicitly mentioned =:=, which is always a good thing.

(What's going on is that Scala is introducing two implicit =:= s, one linking a to A and one linking a to B, but doesn't chain implicits).

Saturday, June 1, 2013

Using continuations to reorder type inference, or, a lesson in literal program design

Consider the following definition for a chain of computations:

  1  sealed trait Chain[-A,+B]
  2  case class End[A]() extends Chain[A,A]
  3  trait Chaining[A,B] extends Chain[A,B] {
  4    type I
  5    val first: A => I
  6    val rest: Chain[I,B]
  7  }

It is much like the List type, except that there is a hidden (existential) dependency between adjacent elements.

The "nestedness" of the structure is chosen to be like the List type for a very specific reason: when you go to use a Chain, you will perform the first computation first, so it should be the least nested; the last computation should be the most nested.

List defines a right-associative :: operator for building. Let us define such an operator:

  8  sealed trait Chain[-A,+B] {
  9    def :->:[A2<:A,P](f: P => A2): Chain[P,B] = new Chaining[P,B] {
 10      type I = A
 11      val first = f
 12      val rest = Chain.this
 13    }
 14  }

But we see we have a type inference problem:

 15  // Note the type annotation needed for second stage.
 16  val pipeline = (
 17         { (i: Int) => i.toString }
 18    :->: { (s: String) => s ++ s }
 19    :->: End()
 20  )

We might be tempted to think that the problem arises with Scala insisting that infix operators be members -- this does seem to be a likely explanation, since :->: being a member means that Scala must deduce the type of the right-hand operand first, which is the opposite of the direction in which we would like type information to flow. But you will find that the following non-member function fairs no better:

 21  def c[A,B,C](f: A => B, rest: Chain[B,C]): Chain[A,C] = f :->: rest

But we can learn something from the areas where Scala's type inference does seem to work very well, namely chains of list operations, map, filter, grouBy, etc, where you almost never need to annotate types. The reason is that any stage of a list-operation chain is itself an "incomplete operation chain" (ie we are (potentially) not done the computation yet), which can be extended to another incomplete chain by way of the next map or filter.

So what we need is a representation of an "incomplete Chain".

The most obvious meaning of an "incomplete Chain" is "something that can be turned into a complete chain by providing the following steps".

Principle rule of functional programming design: when you can say plainly and succinctly what something "is", translate that sentence literally into code, and there is your design.

Do not use tricks, do not use fancy types: simply turn sentences into their most literal meaning.

(Analogously in imperative code, you are looking to say what something "does").

Let us apply the rule as literally as we can:

 22  trait ChainBuilder[A,B] { outer =>
 23    def build[C](rest: Chain[B,C]): Chain[A,C]
 24  }

Our "incomplete Chain" is, unsurprisingly, a kind of continuation, since a continuation is the most basic form of an incomplete computation.

We can then define the incremental building operations:

 25  trait ChainBuilder[A,B] { outer =>
 26    def build[C](rest: Chain[B,C]): Chain[A,C]
 27  
 28    def and[C](f: B => C) = new ChainBuilder[A,C] {
 29      def build[D](rest: Chain[C,D]) = outer.build(f :->: rest)
 30    }
 31    def done = build(End())
 32  }
 33  
 34  def start[A] = new ChainBuilder[A,A] {
 35    def build[B](rest: Chain[A,B]) = rest
 36  }

And now everything works:

 37  val pipeline3 = (start[Int]
 38    and { i => i.toString }
 39    and { s => s ++ s }).done

I feel there may be something more useful here than merely getting better type inference but I haven't yet figured out what.