Tuesday, August 16, 2011

New best thing ever: pyinotify

What could be better than pyinotify?

You can track accesses. Accesses!

Is that not the coolest?

Say we're building a program like

     a.c 

  1  #include "a.h"

     a.h 

  1  #include "b.h"

     b.h 

  1  
     c.h 

  1  

So in total we have

$ ls *.[ch]
a.c
a.h
b.h
c.h

So the dependencies we have are

  1  a.o: a.c a.h b.h

If we compile using pyinotify we'll see that:

     main5e2d.py 

  1  from treewatcher import run_watch_files
  2  from subprocess import Popen
  3  import os
  4  
  5  def compile():
  6      Popen(['gcc', '-c', 'a.c', '-o', 'a.o']).wait()
  7  
  8  _, accesses = run_watch_files(compile, '.')
  9  print('Accessed:')
 10  for path in accesses.accessed:
 11      print('    %s' % os.path.relpath(path))
 12  print('Modified:')
 13  for path in accesses.modified:
 14      print('    %s' % os.path.relpath(path))

Accessed:
    a.h
    a.c
    a.o
    b.h
Modified:
    a.o

(treewatcher)

No need for any language-dependent tool such as gcc -M. No need to even have a clue what kind of build is taking place -- you know the compiler looked at b.h so it probably made a decision based on it.

But it's tracking the wrong thing

What if we had

     d.c 

  1  #include "not-exist.h"

And build

     main3b70.py 

  1  from treewatcher import run_watch_files
  2  from subprocess import Popen
  3  import os
  4  
  5  def compile():
  6      Popen(['gcc', '-c', 'd.c', '-o', 'd.o']).wait()
  7  
  8  _, accesses = run_watch_files(compile, '.')
  9  print('Accessed:')
 10  for path in accesses.accessed:
 11      print('    %s' % os.path.relpath(path))
 12  print('Modified:')
 13  for path in accesses.modified:
 14      print('    %s' % os.path.relpath(path))

d.c:1:23: fatal error: not-exist.h: No such file or directory
compilation terminated.
Accessed:
    d.c
Modified:

Which of course fails, and a build tool would report failure and give up at this point. But when I was writing rstweaver I didn't think that would be appropriate for literate programming -- error messages are part of the product, and you want those to show up in the output just like anything else.

With this mindset, the output of the process is the error message that was produced -- that's the part you want to see. And the input to the process is the files in the directory.

The problem is that pyinotify sees this operation depending on only one file, d.c, and doesn't see the dependence on the existence (in this case nonexistence) of not-exist.h, but in reality changing that existence will change the output of the process, from an error message to sucess.

So are we back to needing an understanding of the language?

What did gcc do that might have clued us in to the missing dependency? How did it know not-exist.h wasn't there?

  1. It may have opendir()d the parent directory and stepped through the contents, looking for not-exist.h.

    If this is the case, then there's nothing we can do to spot not-exist.h without understanding something about how C works. We could see that the operation "depends on the contents of the directory", which would admit some extraneous dependencies but at least prevent us from getting stuck on a bad result.

  2. It may have attempted to open() not-exist.h and failed. In this case, you'd think that some information might show up, but I never got anything like this from pyinotify.

The fact that gcc returned a nonzero exit code is some clue that adding a new file might change the result, but that fact is specific knowledge of gcc.

So I'd say yes, to be really thorough we still need an understanding of the language. But just barely.

No comments:

Post a Comment