Sunday, June 26, 2011

RObjectTables are AWESOME

Why isn't everyone using the RObjectTables package? This is the best thing ever!

Here's the basic idea of RObjectTables: An environment is an object where you can lookup names and associate them with values. And in particular its where you look up variables used in an expression. But there's no reason you can't take any other object that associates names with values (data.frame, list, SQL database, CSV file, filesystem, ...) and use that to lookup your variables.

R already has sortof this because you can use eval() and friends on lists and data.frames. But it's not extensible. The purpose of RObjectTables is to make it extensible.

In its current version RObjectTables is somewhat limited, because you can only attach() the created environments, not pass them to with() and such. But with extremely minor modifications this becomes possible. You can find a version with these changes on my GitHub site.

Now let me demonstrate how unbelievably useful this is. So naturally I will start with a useless example: an environment of only strings!

> library(RObjectTables)

> db = newRClosureTable(list(

> assign = function(name, value) {

> # Not used

> },

>

> get = function(name) {

> name

> },

>

> exists = function(name) {

> T

> },

>

> remove = function(name) {

> # Not used

> },

>

> objects = function() {

> # Not used

> }

> ))

> with(db, x)

[1] "x"

Muahahaha.

This environment, as you can see, is not terribly useful:

> try(

> with(db, x + y),

> silent=T

> )

> geterrmessage()

[1] "Error in x + y : non-numeric argument to binary operator\n"

Yes... using "+" (character) as a function... not so good. Also we don't have to feel so bad about not implementing assign() because with `<-` turning into "<-" there's not much we could assign.

But now let's do something really useful.

> reality = function() {

> parent = parent.frame()

> outer = new.env(parent = parent)

>

> formulas = list()

> valid = list()

> values = list()

>

> make.ref = function(name) {

> ref = list(

> db = self,

> name = name

> )

> class(ref) = 'ref'

> ref

> }

>

> dep.table = list()

> rdep.table = list()

>

> propagate = function(name) {

> if (is.null(valid[[name]]) || !valid[[name]]) {

> }

> else {

> for (ref in rdep.table[[name]]) {

> ref$db$reset(ref$name)

> }

> }

> }

>

> reset = function(name) {

> propagate(name)

> valid[[name]] <<- F

> }

>

> add.dep = function(name, ref) {

> dep.table[[name]] <<- c(list(ref), dep.table[[name]])

> }

>

> add.rdep = function(name, ref) {

> rdep.table[[name]] <<- c(list(ref), rdep.table[[name]])

> }

>

> del.rdep = function(name, ref) {

> rdep.table[[name]] <<- setdiff(rdep.table[[name]], list(ref))

> }

>

> ptr = newRClosureTable(list(

> assign = function(name, value) {

> force(value)

> formulas[[name]] <<- function() value

> reset(name)

> },

> get = function(name) {

> if (is.null(formulas[[name]])) {

> return(tryCatch(

> get(name, outer),

> error = function(e) getUnbound()

> ))

> }

>

> this.ref = make.ref(name)

> for (ref in working.refs) {

> ref$db$add.dep(ref$name, this.ref)

> }

>

> if (is.null(valid[[name]]) || !valid[[name]]) {

> old.deps <<- dep.table[[name]]

> dep.table[[name]] <<- list()

>

> working.refs <<- c(list(this.ref), working.refs)

>

> values[[name]] <<- formulas[[name]]()

>

> working.refs <<- working.refs[-1L]

>

> lost.deps = setdiff(old.deps, dep.table[[name]])

> for (ref in lost.deps) {

> ref$db$del.rdep(ref$name, this.ref)

> }

>

> gained.deps = setdiff(dep.table[[name]], old.deps)

> for (ref in gained.deps) {

> ref$db$add.rdep(ref$name, this.ref)

> }

>

> valid[[name]] <<- T

> }

>

> values[[name]]

> },

> exists = function(name) {

> !is.null(valid[[name]]) || exists(name, parent)

> },

> remove = function(name) {

> # TODO: this

> },

> objects = function(name) {

> names(valid)

> }

> ))

> class(ptr) = c('reality', class(ptr))

>

> attr(ptr, 'delayedAssign') = (

> function(name, promise) {

> formulas[[name]] <<- promise

> reset(name)

> promise

> }

> )

>

> self = list(

> ptr = ptr,

> reset = reset,

> add.dep = add.dep,

> add.rdep = add.rdep,

> del.rdep = del.rdep

> )

> class(self) = 'realptr'

>

> ptr

> }

> working.refs = list()

Why did I call it a "reality"? Well... uh... all the good names are taken. So think of it as storing the current version of... reality. What it does is keep track of a set of variables that depend on each other. Oh yeah and we need some way to assign into it:

> `$.reality` = function(r, name) {

> get(name, envir=r)

> }

> `$<-.reality` = function(r, name, value) {

> expr = substitute(value)

> env = parent.frame()

>

> attr(r, 'delayedAssign')(

> name, function() eval(expr, env)

> )

>

> r

> }

Now we can write:

> r = reality()

> r$x = 10

> r$y = sqrt(r$x)

> r$y

[1] 3.162278

> r$x = 20

> r$y

[1] 4.472136

It is the feature I have always been wishing for. Now if only we could get .GlobalEnv to do this... You might think of attach()ing it: don't do that. Trust me. I can't show you because it will crash Sweave and not produce any output, but that doesn't work yet (it would be nice if it did).

Luckily R has a useful feature that was never intended for this purpose (I find it is this way with most of R's finest features): the debugging browser. So let us...

> with.db = function(env, func1, func2) {

> if ('reality' %in% class(env)) {

> func1(env)

> }

> else {

> func2()

> }

> }

> reality.assign = function(name, rvalue) {

> env <- parent.frame()

> name <- substitute(name)

> srvalue <- substitute(rvalue)

>

> with.db(env,

> function(db) {

> name <- as.character(deparse(name))

> promise <- function() eval(srvalue, env)

>

> attr(db, 'delayedAssign')(name, promise)

> alist(x=)$x

> },

> function() {

> do.call(`<-`, list(name, rvalue), envir=env)

> }

> )

> }

> with(r, `=` <- reality.assign)

> run.interpreter = function(base.env) {

> options(browserNLdisabled = T)

> with(base.env, browser())

> }

Now I don't know how to feed an interactive debugging session into Sweave, so here is copy-pasted this feature in use:

Browse[1]> y = sqrt(x)

Browse[1]> for (i in 1:10) {
+ x = i
+ print(y)
+ }
[1] 1
[1] 1.414214
[1] 1.732051
[1] 2
[1] 2.236068
[1] 2.449490
[1] 2.645751
[1] 2.828427
[1] 3
[1] 3.162278
Browse[1]> b = a

Browse[1]> c = b

Browse[1]> a = 1

Browse[1]> c
> run.interpreter(r)
Called from: eval(expr, envir, enclos)
Browse[1]> y = sqrt(x)

Browse[1]> for (i in 1:10) {
+ x = i
+ print(y)
+ }
[1] 1
[1] 1.414214
[1] 1.732051
[1] 2
[1] 2.236068
[1] 2.449490
[1] 2.645751
[1] 2.828427
[1] 3
[1] 3.162278
Browse[1]> z = y**2

Browse[1]> z
[1] 10
Browse[1]> x = 1:3

Browse[1]> z
[1] 1 2 3
Browse[1]> b = a

Browse[1]> c = b

Browse[1]> a = 1

Browse[1]> c
> 

Haha "c" closed the debugger ;)

So far this actually seems stable enough for general use. I wouldn't trust it with data you care about, but I intend to make great use of it.

2 comments:

  1. The datamap package also provides an interface to the RobjectTables feature. Ter next release (though I don't know when I'll have the time) will allow reference classes to be attached.

    ReplyDelete
  2. @Jeffrey: Excellent! It's such a useful feature.

    ReplyDelete