Sunday, August 7, 2011

Sweeping changes to rstweaver

Improvements:

  • New languages: C++, Python
  • Caching
  • Pure docutils (so conversion to LaTeX works -- conversion to ODT which was one of my primary motivations for removing HTML-specific code still fails, which I think may be due to a bug in docutils but I'm still figuring that out)

And it's sitting on my github site with some examples.

Implementing new features doubled as a chance to encounter newer and yet more frustrating problems. It is an eternal fact of programming that reality is smarter than you are: the ideas that you design from your own creativity will simply never measure up to the horrors nature throws at you haphazardly. She makes it look so easy.

An example would be the case of Python decorators when the decorator function also happens to have class scope.

     decor.py 

  1  class A:
  2  
  3      def fivify(func):
  4          def handler():
  5              return func(5)
  6          return handler
  7  
  8      @fivify
  9      def a(x):
 10          return x
 11  

We are now to wonder what transformations may have been applied on fivify (such as making it a bound method) before it is used as a decorator. These thoughts give us an error message with the approximate clarity of

     decor-thoughts.py 

  1  class A:
  2  
  3      @classmethod
  4      def fivify(func):
  5          def handler():
  6              return func(5)
  7          return handler
  8  
  9      @fivify
 10      def a(x):
 11          return x
 12  
 13  a = A()
 14  
Traceback (most recent call last):
  File "decor-thoughts.py", line 1, in <module>
    class A:
  File "decor-thoughts.py", line 9, in A
    @fivify
TypeError: 'classmethod' object is not callable

So we are lead to see that decoration must happen while the methods are still sitting in the uninstantiated class (this can cause other surprises).

A fair amount of frustration comes from the docutils library itself. Docutils is actually quite wonderful, but a conspicuous need for polishing renders it a capable adversary. Most of the snags can be traced to rather boring implementation bugs, but one of its more archetictural flaws helps remind us why in programming python we stick to the pythonic.

At the heart of docutils' extensibility lie directives, which serve much the same role as tex commands except that they are written in python and manipulate docutils rather than tex. So I set out to registor a directive:

     directive.py 

  1  from docutils.parsers.rst import Directive, directives
  2  from docutils.core import publish_parts
  3  
  4  class MyDirective(Directive):
  5  
  6      def __init__(self, *a, **b):
  7          Directive.__init__(self, *a, **b)
  8  
  9      def run(self):
 10          return []
 11  
 12  directives.register_directive('mydir', MyDirective)
 13  publish_parts('\n.. mydir::\n\n')
 14  

From this we learn that register_directive() appears to take a factory, which it is going to instantiate, and pass some arguments which I appear not to care about.

I was almost content with this arrangement -- I just wanted to pass some context to MyDirective whenever it was created. Seeing as register_directive() appeared to take just a callable factory, I added some context:

     directive.py (cont)

  4  class MyDirective(Directive):
  5  
  6      def __init__(self, ctx, *a, **b):
  7          Directive.__init__(self, *a, **b)
  8          self.ctx = ctx
  9  
 10      def run(self):
 11          return []
 12  

and some factory:

     directive.py (cont)

 13  def create(*a, **b):
 14      return MyDirective(None, *a, **b)
 15  
 16  directives.register_directive('mydir', create)
 17  publish_parts('\n.. mydir::\n\n')
 18  
Traceback (most recent call last):
  File "directive.py", line 17, in <module>
    publish_parts('\n.. mydir::\n\n')
  File "/usr/lib/pymodules/python2.7/docutils/core.py", line 427, in publish_parts
    enable_exit_status=enable_exit_status)
  File "/usr/lib/pymodules/python2.7/docutils/core.py", line 641, in publish_programmatically
    output = pub.publish(enable_exit_status=enable_exit_status)
  File "/usr/lib/pymodules/python2.7/docutils/core.py", line 203, in publish
    self.settings)
  File "/usr/lib/pymodules/python2.7/docutils/readers/__init__.py", line 69, in read
    self.parse()
  File "/usr/lib/pymodules/python2.7/docutils/readers/__init__.py", line 75, in parse
    self.parser.parse(self.input, document)
  File "/usr/lib/pymodules/python2.7/docutils/parsers/rst/__init__.py", line 157, in parse
    self.statemachine.run(inputlines, document, inliner=self.inliner)
  File "/usr/lib/pymodules/python2.7/docutils/parsers/rst/states.py", line 170, in run
    input_source=document['source'])
  File "/usr/lib/pymodules/python2.7/docutils/statemachine.py", line 233, in run
    context, state, transitions)
  File "/usr/lib/pymodules/python2.7/docutils/statemachine.py", line 454, in check_line
    return method(match, context, next_state)
  File "/usr/lib/pymodules/python2.7/docutils/parsers/rst/states.py", line 2281, in explicit_markup
    nodelist, blank_finish = self.explicit_construct(match)
  File "/usr/lib/pymodules/python2.7/docutils/parsers/rst/states.py", line 2293, in explicit_construct
    return method(self, expmatch)
  File "/usr/lib/pymodules/python2.7/docutils/parsers/rst/states.py", line 2035, in directive
    directive_class, match, type_name, option_presets)
  File "/usr/lib/pymodules/python2.7/docutils/parsers/rst/states.py", line 2093, in run_directive
    'Directive "%s" must return a list of nodes.' % type_name
AssertionError: Directive "mydir" must return a list of nodes.

Oh docutils. This error message had me for... I'd say near twenty minutes, because darn it I am returning a list of nodes.

So I was a bit suprised to find out that register_directive doesn't actually take a callable object -- and it's not thinking of it as a "factory" either -- it's expecting to get either a class, which it will instantiate, or a function, which it will leave as is until it gets called to handle a directive. And there's the problem.

Now nothing against docutils because it's actually a very nice library and for the most part well designed but checking the type of an object and then branching based on the result is decidedly unpythonic.

And that's how I learned that "pythonic" is something that actually matters and isn't just something people say.

(It's not pythonic because it doesn't respect duck typing -- it's not supposed to matter what the type of the object is so long as it has the right properties.)

But then it was later in the same project that I... I found myself wanting to commit the same error! You see I had an "interface" like

     weaver.py 

  1  class WeaverLanguage:
  2  
  3      def run(self, code, args):
  4          '''
  5          Returns content to be added to the document.
  6          '''
  7          raise NotImplementedError
  8  

I could always make run() return a docutils node -- because that would cover all cases. You want just plain text? Stick the text in a node. HTML? make a raw HTML node. So that would solve all my problems.

But I just didn't want to do that. I wanted to make the most common case of returning raw text easy, and not require looking at docutils (sorry again docutils, there's nothing wrong with you, really). I could break it up into two stages:

     weaver-stages.py 

  1  class WeaverLanguage:
  2  
  3      def run_text(self, code, args):
  4          '''
  5          Returns content to be added to the document.
  6          '''
  7          raise NotImplementedError
  8  
  9      def run_node(self, code, args):
 10          text = self.run_text(code, args)
 11          return nodes.inline(text, text)
 12  

Except that I don't want to do that either, because it puts the output type in the name of the function, making it look way more important than it actually is (solution: type inference and typeclasses -- also not pythonic).

So I went with the unpythonic hack. And maybe some day someone'll hate me for it.

No comments:

Post a Comment