Thursday, November 24, 2011

Permanence

$\displaystyle\int{\sqrt{1 + \sqrt{x}}}$

The Calc I problem ends with:

Comment: The latest version of the big and powerful symbolic manipulation program Maple, available on most Rutgers systems, is unable to find this antiderivative (with the default settings for the program). Maple’s major competitor, Mathematica, considers the integral and returns

Mathematica could not find a formula for your integral. Most likely this means that no formula exists.

which is amusing.

Yes, hilarious.

The author has a common problem: he wants his students to sit down and grind through a few integrals but the students are getting bored and reaching for the computer. How do you convince them that learning to integrate by hand is worth the tedium?

Well, one thing to consider is that the students may be right. I'm not saying practice isn't worth something, I'm saying you're never going to run out of things to work on even if you delegate the monotonous tasks to the machines. There's always something more to do and your students might even discover something new. If you don't have the creativity to write problems that can survive the onslaught of technology, ask someone next door.

The thing is, even the author's attempted solution -- a tricky integral -- is only halfway to actually motivating the subject. Even if Maple can't do it (it can -- I checked, and so can Maxima, but that's besides the point), it's still just one random integral nobody cares about -- it's going to vanish by tomorrow.

Students are not used to being asked to create things of permanent value. They are asked to create temporary things that are sufficient to pass the grader's eye but will not serve them later. No matter how "real worldy" you make the problem, if the solution isn't generally useful, it's just going to vanish along with all the rest.

Let me emphasise that last point again because it's usually missed: it doesn't matter at all how realistic the problem seems, if the student isn't going to use their own answer again later, it's not something permanent. And the important part isn't that it will vanish, it's that when you know it will vanish, you approach it differently.

Do you ever notice that the people around you (and you too, though I was hoping not to rely on your own introspection) seem hesitant, nervous even to really finish something? This is because we are not used to permanence. We want the problem to have finite scope from the outset and vanish when it is complete. Do you notice that people have trouble grounding their thoughts in reality? If you're not producing something that lasts, why make it complete?

There was a chimpanzee named Nim Chimpsky whom researchers tried to teach language by positive and negative reinforcement. At first Nim seemed to be learning a lot of new words, then they found out he'd just been learning to the spec, imitating enough to pass the test, and had in fact learned nothing.

This is what we have been trained to do. Hey, if people are writing software to solve integrals, that's something of permanent value... how about having the students do that? Writing a program to solve a problem is better than solving it by hand 100 times because to make your code work you have to understand the steps completely. The compiler is unforgiving. I think this is often what scares teachers away from programming as part of their class -- they think they know, and want to believe, that the students understand the material, but the compiler is too revealing.

This is why you should use the compiler, though, because it breaks your illusions.

We are transient, our memories are transient, we forget often and dislike remembering, the best we can do is produce something outside ourselves that lasts.

Tuesday, November 22, 2011

The Best Way to Squander Power

"Cops Spray Passive Protestors"

It does look pretty bad.

You can say with near certainty that the officer holding the red can is in the 99%. But now he's been made their enemy. UC Davis Chancellor called his actions "chilling", his job is at risk. He's in the 99%, but the 99% aren't helping him, they're hurting him. they're making his future uncertain, maybe worse. How did he go from being your friend to being your enemy?

Why does the video look so shocking? Because the spraying doesn't look tactical -- it looks punative. If I had to guess what was going through the officer's mind, I wouldn't say he's trying to break the circle so his colleagues can get out (though that's probably part of it); somewhere in there he's enjoying the display of pain -- or rather, he's enjoying that he has the power to inflict pain. At least he has the power to do something.

You've been protesting because the voice of the 99% has been cut off. If I had to give a picture of someone reacting badly because their voice has been cut off, it would be a middle aged man pepper-spraying college kids, walking back, and spraying them again.

What we're starting to realize, though, is that the Occupy movement is powerful, because the ideas behind it are powerful. But using that power to stick it to a lowly police officer, someone who shouldn't even be your enemy... the easiest way to squander power is to waste it on easy victims.

Look at how calmly the officer settles into his role as enemy. He knows which side he's on before he even pulls out the can. How did he find out? Who told him he was an enemy? Was it something in the way they looked at him, a message passed by eye contact, telling him what part he'd play?

You've got to stop playing it that way.

Monday, November 7, 2011

What's up with Lift

I've been using the Lift web framework for a class project. Lift is written in Scala, which is a statically typed functional language.

The philosophy in these languages tends to be that you should use the type system to prove as much of the consistency of your program as possible about your program so that you don't need to check as much at runtime.

Unfortunately, in web programming your data regularly passes through clients that you have no control over, wiping out all the consistency you so carefully checked. Most of our code is spent tediously pulling unstructured data back into a structured form -- in other words, we don't get much benefit from static types.

But what also surprised me was how many parts of the framework used reflection. Now, all in all the parts using reflection are a tiny fraction of the whole Lift code, but the thing about reflection is it is very noticeable, because it deviates from how the language is supposed to behave. I have decided something: if you want to make Scala into a really great language, find all the places Lift uses reflection, and change the language so that reflection is no longer tempting in those cases.

Managing object lifetime

A rendering snippet like

  1  class FooSnippet extends StatefulSnippet {
  2      val p = S param "p" openOr ""
  3  
  4      def render = "p *" #> p
  5  }

gets instantiated per-request, and has access, when it is instantiated, to the request environment in S. I'm not sure if this one is really Scala's fault. It would have been as easy to do

  1  renderingRules.addStatefulRenderer("FooSnippet") { in =>
  2      val p = S param "p" openOr ""
  3      in |> { "p *" #> p }
  4  }

I guess what Lift is really going for using reflection here is it lets you split the rendering rules across multiple files.

But that's not so bad because it's just managing the lifetime of the whole object. What's really tricky is managing the lifetime of things inside the object...

  1  class Foo extends StatefulSnippet {
  2      object x extends SessionVar[Int](5)
  3  }

Now that's tricky. The data held by x has a longer lifetime than the apparently enclosing object Foo. The class name of x is used as a key to lookup the stored value.

It's not a bad solution, until you realize what would happen in some situations:

  1  (1 to 10) map {_ => new Foo}

Each of the created objects refers to the same session variable. Is this right? Well I don't know! It might be but it might not be and you have no control over it, because it's all done by magic.

So what's really going on here? We're trying to decouple object lifetime from object nestedness. The problem is that if you decouple object lifetime from object nestedness you don't have any more nestedness to play with, and you actually want some kind of nestedness because x's lifetime belongs to the session lifetime.

This seems to be a symptom of the general fact that it's nearly impossible to represent two, independent tree-like structures with the same set of syntax -- either the trees will be heavily tied or you will repeat yourself.

So assuming we can't fix the problem directly let's repeat ourselves.

  1  class Foo(x: Var[Int]) {
  2  }

Now Foo takes x in as an argument so the lifetime is decoupled. Unfortunately now the fact that we want x to live in the session has been erased, and it will end up being spread around:

  1  class Session {
  2      val fooX: Var[Int]
  3  }
  4  
  5  ...
  6  new Foo(currentSession.x)

to two different places neither of which are on Foo. If we had a way to declare extention variables we could do

  1  @extention(Session) val x: Var[Int]
  2  
  3  class Foo(implicit session: Session) {
  4  }

the implicit means that even though we have to pass in the particular session in use, it mostly happens automatically. It also provides a static way to tell which objects require sessions and which don't.

Unfortunately this has got us back to requiring that

  1  (1 to 10) map {_ => new Foo}

all refer to the same x. But what that's really saying is that all the constructed Foos are part of the same FooContainer:

  1  class FooContainer {
  2  
  3      @extention(Session) val x: Var[Int]
  4  
  5      class Foo(implicit session: Session) {
  6      }
  7  }

Now the lifetime of x is tied to the product of Session and FooContainer, which is exactly what we wanted, and gives us the flexibility to chose between

  1  val c = new FooContainer
  2  (1 to 10) map {_ => new c.Foo}

and

  1  (1 to 10) map {_ =>
  2      val c = new FooContainer
  3      new c.Foo
  4  }

Making code operate on multiple levels

It would be nice if

  1  <form>
  2      <p>{textField}</p>
  3      <p>{textField}</p>
  4  </form>

could be used to build up both the HTML and the information about the data it represents. The way it is now, this will necesarily produce a plain NodeSeq and the information that two Strings should be produced on submission has been lost.

This would actually be extremely easy to do using delimited continuations, except that Scala's delimited continuations don't have terribly good type inference, so you have to type too many types to make it worth it.

Splitting up the above code isn't such a bad idea, but transforming the language syntax in this way would be generally useful. I implemented something to do this using TemplateHaskell which I will post when I get around to it. It infers types perfectly, for a reason that actually has nothing to do with Haskell having better type inference that Scala. Basically if you do the desugaring before type-checking, you get a lot of the logic of the type-checker for free. The disadvantage is that error messages become unreadable, though I think that could be solved.

Monday, August 22, 2011

Haskell Buzzing in my Ear

Unfortunately GHC will not have Type Directed Name Resolution (TDNR) for the foreseeable future. TDNR allows you to distinguish similarly named variables based on some portion of their types.

The reason TDNR is unlikely to appear in Haskell is that it is hard to show why you need it. This is because you never actually need it. Any time you run into a problem with naming conflicts, you can always rename them. And if you complain, someone is sure to show you how they would have named their variables differently and there really is no problem.

When you need TDNR is at that moment when you hit a name conflict, and you have to interrupt your thought process to go mangle some names. And you can't show that moment to people. It's like going to StackOverflow with "I'm having trouble writing this function because there's a fly buzzing in my ear" and someone responds "Just write this this and this what's your problem?". But the problem isn't the code -- the problem is the fly buzzing in your ear.

You don't need TDNR to make your code work, you need TDNR to make you work, to save you from being interrupted by random unrelated declarations. To keep your thoughts on the puzzle at hand. Haskell takes a lot of thinking. It's that kind of language.

I've often heard it said that Haskell doesn't need TDNR because it has typeclasses. But typeclasses say the wrong thing -- semantically, they just mean something different. A typeclass says "Here are some operations on a similar pattern that can be specialized in various different ways.". TDNR says "Here are some unrelated functions that, because we are using English, just happen to have the same name." Typeclasses are about how things are similar, TDNR is about how if you're using one you just don't care about the other.

Because that's just how language works -- names take on their meaning based on the context. You couldn't have a conversation if this were not the case. Context encompasses a lot of things -- it happens that types provide a lot of context. But types aren't the only way -- I'd be happy to see Haskell add any kind of context to name resolution. (Well it has two -- modules and local scope. But these aren't enough. Haskell functions are "tiny" and you tend to have a lot at global scope).

I know of no language that fuels name collisions like Haskell. Every other language I can think of has at least one thing Haskell doesn't that helps prevent conflicts. Even OCaml lets you "open" a module locally. In any Haskell program sufficiently large, the name conflicts start building up, and you start adding more qualifying information to your function names. From the programmer's perspective that's just as bad as adding an explicit type annotation every time you call the function. Type inference makes Haskell terser but name collisions blow it back up again.

Of course, if I really wanted to be constructive, I'd learn to hack GHC and add this feature myself. You can get so close with typeclasses that I don't think it would be too hard to add -- but I've never touched the GHC source code. Some day...

Thursday, August 18, 2011

Do older SOers use fewer words?

On StackOverflow, to posters with more experience ask their questions in fewer words?

No. There's no visible difference:

Chars of non-code:

Chars of code:

The data comes from the super-handy StackOverflow API, which was retrieved using wget and then parsed using rjson and XML.

First read in and parse the JSON:

     so.R 

  1  library(rjson)
  2  library(XML)
  3  library(ggplot2)
  4  library(plyr)
  5  
  6  read.qs = function(path) {
  7      fromJSON(file = path)$questions
  8  }
  9  
 10  questions = do.call(c,
 11      lapply(c('page-1.json', 'page-2.json', 'page-3.json'),
 12          read.qs
 13      )
 14  )

Then for each one parse the HTML and look for <pre> and <p> tags:

     so.R (cont)

 15  Table = ldply(questions, function(q) {
 16      body.text = sprintf('<body>%s</body>', q$body)
 17      body = htmlParse(body.text)
 18  
 19      description = tot.length.of(body, '//p//text()')
 20      code = tot.length.of(body, '//pre//text()')
 21  
 22      rep = q$owner$reputation
 23  
 24      data.frame(
 25          rep, description, code
 26      )
 27  })

(where tot.length.of is:

     so.R (cont)

 28  tot.length.of = function(doc, query) {
 29      parts = xpathApply(doc, query, xmlValue)
 30      text = paste(parts, collapse='')
 31      nchar(text)
 32  }

)

Then make the plots:

     so.R (cont)

 33  png('description.png')
 34  print(ggplot(data=Table)
 35      + geom_point(aes(rep, description))
 36      + scale_x_log10()
 37      + scale_y_log10()
 38      + xlab('Rep')
 39      + ylab('Verbosity')
 40  )
 41  dev.off()
 42  
 43  png('code.png')
 44  print(ggplot(data=Table)
 45      + geom_point(aes(rep, code))
 46      + scale_x_log10()
 47      + scale_y_log10()
 48      + xlab('Rep')
 49      + ylab('Verbosity')
 50  )
 51  dev.off()

$ Rscript so.R >/dev/null 2>&1

Tuesday, August 16, 2011

New best thing ever: pyinotify

What could be better than pyinotify?

You can track accesses. Accesses!

Is that not the coolest?

Say we're building a program like

     a.c 

  1  #include "a.h"

     a.h 

  1  #include "b.h"

     b.h 

  1

     c.h 

  1

So in total we have

$ ls *.[ch]
a.c
a.h
b.h
c.h

So the dependencies we have are

  1  a.o: a.c a.h b.h

If we compile using pyinotify we'll see that:

     main5e2d.py 

  1  from treewatcher import run_watch_files
  2  from subprocess import Popen
  3  import os
  4  
  5  def compile():
  6      Popen(['gcc', '-c', 'a.c', '-o', 'a.o']).wait()
  7  
  8  _, accesses = run_watch_files(compile, '.')
  9  print('Accessed:')
 10  for path in accesses.accessed:
 11      print('    %s' % os.path.relpath(path))
 12  print('Modified:')
 13  for path in accesses.modified:
 14      print('    %s' % os.path.relpath(path))

Accessed:
    a.h
    a.c
    a.o
    b.h
Modified:
    a.o

(treewatcher)

No need for any language-dependent tool such as gcc -M. No need to even have a clue what kind of build is taking place -- you know the compiler looked at b.h so it probably made a decision based on it.

But it's tracking the wrong thing

What if we had

     d.c 

  1  #include "not-exist.h"

And build

     main3b70.py 

  1  from treewatcher import run_watch_files
  2  from subprocess import Popen
  3  import os
  4  
  5  def compile():
  6      Popen(['gcc', '-c', 'd.c', '-o', 'd.o']).wait()
  7  
  8  _, accesses = run_watch_files(compile, '.')
  9  print('Accessed:')
 10  for path in accesses.accessed:
 11      print('    %s' % os.path.relpath(path))
 12  print('Modified:')
 13  for path in accesses.modified:
 14      print('    %s' % os.path.relpath(path))

d.c:1:23: fatal error: not-exist.h: No such file or directory
compilation terminated.
Accessed:
    d.c
Modified:

Which of course fails, and a build tool would report failure and give up at this point. But when I was writing rstweaver I didn't think that would be appropriate for literate programming -- error messages are part of the product, and you want those to show up in the output just like anything else.

With this mindset, the output of the process is the error message that was produced -- that's the part you want to see. And the input to the process is the files in the directory.

The problem is that pyinotify sees this operation depending on only one file, d.c, and doesn't see the dependence on the existence (in this case nonexistence) of not-exist.h, but in reality changing that existence will change the output of the process, from an error message to sucess.

So are we back to needing an understanding of the language?

What did gcc do that might have clued us in to the missing dependency? How did it know not-exist.h wasn't there?

It may have opendir()d the parent directory and stepped through the contents, looking for not-exist.h.

If this is the case, then there's nothing we can do to spot not-exist.h without understanding something about how C works. We could see that the operation "depends on the contents of the directory", which would admit some extraneous dependencies but at least prevent us from getting stuck on a bad result.
It may have attempted to open() not-exist.h and failed. In this case, you'd think that some information might show up, but I never got anything like this from pyinotify.

The fact that gcc returned a nonzero exit code is some clue that adding a new file might change the result, but that fact is specific knowledge of gcc.

So I'd say yes, to be really thorough we still need an understanding of the language. But just barely.