Why isn't everyone using the RObjectTables package? This is the best thing ever!
Here's the basic idea of RObjectTables: An environment is an object where you can lookup names and associate them with values. And in particular its where you look up variables used in an expression. But there's no reason you can't take any other object that associates names with values (data.frame, list, SQL database, CSV file, filesystem, ...) and use that to lookup your variables.
R already has sortof this because you can use eval()
and
friends on lists and data.frames. But it's not extensible. The purpose of
RObjectTables is to make it extensible.
In its current version RObjectTables is somewhat limited, because you can
only attach()
the created environments, not pass them to with()
and such. But with extremely minor modifications this becomes possible. You can
find a version
with these changes on my GitHub site.
Now let me demonstrate how unbelievably useful this is. So naturally I will start with a useless example: an environment of only strings!
> library(RObjectTables)
> db = newRClosureTable(list(
> assign = function(name, value) {
> # Not used
> },
>
> get = function(name) {
> name
> },
>
> exists = function(name) {
> T
> },
>
> remove = function(name) {
> # Not used
> },
>
> objects = function() {
> # Not used
> }
> ))
> with(db, x)
[1] "x"
Muahahaha.
This environment, as you can see, is not terribly useful:
> try(
> with(db, x + y),
> silent=T
> )
> geterrmessage()
[1] "Error in x + y : non-numeric argument to binary operator\n"
Yes... using "+" (character) as a function... not so good. Also we don't have to feel so bad about not implementing assign() because with `<-` turning into "<-" there's not much we could assign.
But now let's do something really useful.
> reality = function() {
> parent = parent.frame()
> outer = new.env(parent = parent)
>
> formulas = list()
> valid = list()
> values = list()
>
> make.ref = function(name) {
> ref = list(
> db = self,
> name = name
> )
> class(ref) = 'ref'
> ref
> }
>
> dep.table = list()
> rdep.table = list()
>
> propagate = function(name) {
> if (is.null(valid[[name]]) || !valid[[name]]) {
> }
> else {
> for (ref in rdep.table[[name]]) {
> ref$db$reset(ref$name)
> }
> }
> }
>
> reset = function(name) {
> propagate(name)
> valid[[name]] <<- F
> }
>
> add.dep = function(name, ref) {
> dep.table[[name]] <<- c(list(ref), dep.table[[name]])
> }
>
> add.rdep = function(name, ref) {
> rdep.table[[name]] <<- c(list(ref), rdep.table[[name]])
> }
>
> del.rdep = function(name, ref) {
> rdep.table[[name]] <<- setdiff(rdep.table[[name]], list(ref))
> }
>
> ptr = newRClosureTable(list(
> assign = function(name, value) {
> force(value)
> formulas[[name]] <<- function() value
> reset(name)
> },
> get = function(name) {
> if (is.null(formulas[[name]])) {
> return(tryCatch(
> get(name, outer),
> error = function(e) getUnbound()
> ))
> }
>
> this.ref = make.ref(name)
> for (ref in working.refs) {
> ref$db$add.dep(ref$name, this.ref)
> }
>
> if (is.null(valid[[name]]) || !valid[[name]]) {
> old.deps <<- dep.table[[name]]
> dep.table[[name]] <<- list()
>
> working.refs <<- c(list(this.ref), working.refs)
>
> values[[name]] <<- formulas[[name]]()
>
> working.refs <<- working.refs[-1L]
>
> lost.deps = setdiff(old.deps, dep.table[[name]])
> for (ref in lost.deps) {
> ref$db$del.rdep(ref$name, this.ref)
> }
>
> gained.deps = setdiff(dep.table[[name]], old.deps)
> for (ref in gained.deps) {
> ref$db$add.rdep(ref$name, this.ref)
> }
>
> valid[[name]] <<- T
> }
>
> values[[name]]
> },
> exists = function(name) {
> !is.null(valid[[name]]) || exists(name, parent)
> },
> remove = function(name) {
> # TODO: this
> },
> objects = function(name) {
> names(valid)
> }
> ))
> class(ptr) = c('reality', class(ptr))
>
> attr(ptr, 'delayedAssign') = (
> function(name, promise) {
> formulas[[name]] <<- promise
> reset(name)
> promise
> }
> )
>
> self = list(
> ptr = ptr,
> reset = reset,
> add.dep = add.dep,
> add.rdep = add.rdep,
> del.rdep = del.rdep
> )
> class(self) = 'realptr'
>
> ptr
> }
> working.refs = list()
Why did I call it a "reality"? Well... uh... all the good names are taken. So think of it as storing the current version of... reality. What it does is keep track of a set of variables that depend on each other. Oh yeah and we need some way to assign into it:
> `$.reality` = function(r, name) {
> get(name, envir=r)
> }
> `$<-.reality` = function(r, name, value) {
> expr = substitute(value)
> env = parent.frame()
>
> attr(r, 'delayedAssign')(
> name, function() eval(expr, env)
> )
>
> r
> }
Now we can write:
> r = reality()
> r$x = 10
> r$y = sqrt(r$x)
> r$y
[1] 3.162278
> r$x = 20
> r$y
[1] 4.472136
It is the feature I have always been wishing for. Now if only we could get
.GlobalEnv
to do this... You might think of
attach()ing
it: don't do that. Trust me. I can't show you because
it will crash Sweave and not produce any output, but that doesn't work yet (it
would be nice if it did).
Luckily R has a useful feature that was never intended for this purpose (I find it is this way with most of R's finest features): the debugging browser. So let us...
> with.db = function(env, func1, func2) {
> if ('reality' %in% class(env)) {
> func1(env)
> }
> else {
> func2()
> }
> }
> reality.assign = function(name, rvalue) {
> env <- parent.frame()
> name <- substitute(name)
> srvalue <- substitute(rvalue)
>
> with.db(env,
> function(db) {
> name <- as.character(deparse(name))
> promise <- function() eval(srvalue, env)
>
> attr(db, 'delayedAssign')(name, promise)
> alist(x=)$x
> },
> function() {
> do.call(`<-`, list(name, rvalue), envir=env)
> }
> )
> }
> with(r, `=` <- reality.assign)
> run.interpreter = function(base.env) {
> options(browserNLdisabled = T)
> with(base.env, browser())
> }
Now I don't know how to feed an interactive debugging session into Sweave, so here is copy-pasted this feature in use:
Browse[1]> y = sqrt(x) Browse[1]> for (i in 1:10) { + x = i + print(y) + } [1] 1 [1] 1.414214 [1] 1.732051 [1] 2 [1] 2.236068 [1] 2.449490 [1] 2.645751 [1] 2.828427 [1] 3 [1] 3.162278 Browse[1]> b = a Browse[1]> c = b Browse[1]> a = 1 Browse[1]> c > run.interpreter(r) Called from: eval(expr, envir, enclos) Browse[1]> y = sqrt(x) Browse[1]> for (i in 1:10) { + x = i + print(y) + } [1] 1 [1] 1.414214 [1] 1.732051 [1] 2 [1] 2.236068 [1] 2.449490 [1] 2.645751 [1] 2.828427 [1] 3 [1] 3.162278 Browse[1]> z = y**2 Browse[1]> z [1] 10 Browse[1]> x = 1:3 Browse[1]> z [1] 1 2 3 Browse[1]> b = a Browse[1]> c = b Browse[1]> a = 1 Browse[1]> c >
Haha "c" closed the debugger ;)
So far this actually seems stable enough for general use. I wouldn't trust it with data you care about, but I intend to make great use of it.
The datamap package also provides an interface to the RobjectTables feature. Ter next release (though I don't know when I'll have the time) will allow reference classes to be attached.
ReplyDelete@Jeffrey: Excellent! It's such a useful feature.
ReplyDelete