Struggling Through Problems: April 2011

Wednesday, April 27, 2011

"Inside" Functors -- Multiple Arguments

(The examples here work with the version of insidefunctor tagged as "v1")

Say we want to support something like

> each(x) + each(y)

If we're going to call a function on multiple arguments, each one of which might specify some new behavior, we have to resolve the conflict somehow. For a start, let's say we give each argument a "level" and call one of them the "winner".

> apply.check.functor = function(func, args) {

> if (length(args) == 0) {

> return(func())

> }

> functor.levels = lapply(args, function(x) {

> if (is.inside.functor(x)) {

> level(x)

> }

> else {

> 0

> }

> })

> winner.i = which.max(functor.levels)

> winner.arg = args[[winner.i]]

> if (!is.inside.functor(winner.arg)) {

> do.call(func, args)

> }

> else {

> apply.functor(winner.arg, func, args)

> }

This means we must also modify fmap to pass on multiple arguments:

> fmap = function(func) {

> params = formals(args(func))

> new.func = function() {

> .args = as.list(environment())

> apply.check.functor(func, .args)

> }

> formals(new.func) = params

> new.func

> }

And now apply.functor.each is going to have to do the work of reconciling the possibly competing messages:

> apply.functor.each = function(inside, func, args, caller) {

> our.level = level(inside)

> args.boxed = args

> for (i in seq_along(args.boxed)) {

> arg = args.boxed[[i]]

> if (is.inside.functor(arg) && level(arg) >= our.level) {

> if (length(inside$items) != length(arg$items)) {

> stop("Axis mismatch: ", inside, " and ", arg)

> }

> else {

> args.boxed[[i]] = insert.each(inside, arg)

> }

> items = list()

> for (i in seq_along(inside$items)) {

> piece.args = lapply(args.boxed, function(arg) {

> arg$items[[i]]

> })

> res = apply.check.functor(func, piece.args)

> items[[i]] = res

> }

> each(items)

> }

This insert.each is new: it pulls an argument into the functor by broadcasting it along the axis being iterated over:

> insert.each = function(inside, obj) {

> each(lapply(inside$items, function(.) obj))

> }

We need to define that level method. For now just make all levels 1 until we think of a good reason to make them otherwise.

> level = function(...) {

> UseMethod("level")

> }

> level.each = function(...) {

> 1

> }

Then retrieve those functions from the last post:

> is.inside.functor = function(...) {

> UseMethod("is.inside.functor")

> }

> is.inside.functor.default = function(...) {

> F

> }

> is.inside.functor.each = function(inside) {

> T

> }

> apply.functor = function(...) {

> UseMethod("apply.functor")

> }

> each = function(arg) {

> inside = list(items = arg)

> class(inside) = "each"

> inside

> }

And see if this gives something reasonable:

> x = list(1, 2, 3)

> y = list(4, 5, 6)

> `%+%` = fmap(`+`)

> each(x) %+% each(y)

$items
$items[[1]]
[1] 5

$items[[2]]
[1] 7

$items[[3]]
[1] 9

attr(,"class")
[1] "each"

> each(x) %+% 1

$items
$items[[1]]
[1] 2

$items[[2]]
[1] 3

$items[[3]]
[1] 4

attr(,"class")
[1] "each"

Now we can almost run that code from the beginning, fmap still has a problem:

> `%:%` = fmap(`:`)

> print(`%:%`)

function () { .args = as.list(environment()) apply.check.functor(func, .args) } <environment: 0x2fdd930>

The problem is `:` does not have any formal parameters. seq will fail too because it's parameters are '...'. These can be solved, but for now define new functions

> seq. = fmap(function(a, b) {

> seq(a, b)

> })

> sum. = fmap(function(x) {

> sum(x)

> })

> sum.(seq.(1, each(x)))

$items
$items[[1]]
[1] 1

$items[[2]]
[1] 3

$items[[3]]
[1] 6

attr(,"class")
[1] "each"

This opens up a real opportunity. Languages like R and Matlab already support something very similar to each(): for numeric vectors, x + y means add up the corresponding elements. And "corresponding" means having the same sequential position.

But just because two vectors have the same length does not mean they correspond. And normally R will not check that for you. But using inside functors we can check.

Since the above functions are still rather incomplete and this is getting to be a lot of code sitting around in one place, for what follows I am going to use the package insidefunctor from https://github.com/ellbur/r-inside-functor. So let's load the package:

> rm(list = ls())

> library(insidefunctor)

In the insidefunctor package, each is slightly more generalized. Anything can be eached if it supports the methods

unpack(object)
pack(object, items)
make.axis(object)

The package already defines these functions for vectors and lists. Let's make a new kind of object that remembers the dimension it runs along.

> as.dimension = function(items) {

> dimension = list(items = items, id = next.dimension.id())

> class(dimension) = "dimension"

> dimension

> }

> unpack.dimension = function(dimension) {

> dimension$items

> }

> pack.dimension = function(dimension, items) {

> dimension$items = items

> dimension

> }

> make.axis.dimension = function(dimension) {

> seq = seq_along(dimension$items)

> attr(seq, "id") = dimension$id

> seq

> }

> dimension.id.counter = 0

> next.dimension.id = function() {

> dimension.id.counter <<- dimension.id.counter + 1

> dimension.id.counter

> }

Setting the 'id' attribute of the returned axis ensures that each will not let you line up two dimensions whose ids differ.

Let's check that code:

> x = as.dimension(c(1, 2, 3))

> y = as.dimension(c(4, 5, 6))

> `%+.%` = fmap(`+`)

> sq. = fmap(function(z) z^2)

> try(collect(each(x) %+.% each(x)), silent = T)

$items
$items[[1]]
[1] 2

$items[[2]]
[1] 4

$items[[3]]
[1] 6

$id
[1] 1

attr(,"class")
[1] "dimension"

> try(collect(each(x) %+.% sq.(each(x))), silent = T)

$items
$items[[1]]
[1] 2

$items[[2]]
[1] 6

$items[[3]]
[1] 12

$id
[1] 1

attr(,"class")
[1] "dimension"

> try(collect(each(x) %+.% each(y)), silent = T)

> geterrmessage()

Error in apply.functor.each(winner.arg, func, args, apply.check.functor) : Axis mismatch: 11list(items = c(1, 2, 3), id = 1)c(1, 2, 3)1:3list() and 11list(items = c(4, 5, 6), id = 2)c(4, 5, 6)1:3list()

Excellent. We can add x to itself or something calculated from itself, but we can't add x to y because we haven't told each that those variables lie along the same axis -- maybe they don't.

If we want them to correspond, we can say so explicitly.

> align = function(dim1, dim2) {

> if (length(dim1$items) != length(dim2$items)) {

> stop("Cannot align; lengths differ")

> }

> dim1$id = dim2$id

> dim1

> }

> y = align(y, x)

> try(collect(each(x) %+.% each(y)))

$items
$items[[1]]
[1] 5

$items[[2]]
[1] 7

$items[[3]]
[1] 9

$id
[1] 1

attr(,"class")
[1] "dimension"

Now at least it can't happen by accident.

Saturday, April 23, 2011

"Inside" Functors

By "inside" I mean inside the parentheses, unlike normal functors which are written outside the parentheses.

This really starts with a grammatical detail. Sentences like

> sapply(ns, function(n) {

> sum(1:n)

> })

translate into English "Take the list of, for every element in ns the sum of 1 to n." This is fine for coding but that's not at all how you'd actually talk.

What if the code were written like

> sum(1:each(ns))

Which sounds like "Take the sum from 1 to each of the ns", which I think sounds more natural. The challenge: make that code run.

Let's make things easier for a moment and consider just functions of 1 variable. Clearly we are going to have to change what is meant by "calling" a function. So forget nice syntax for a moment and define

> apply.check.functor = function(func, arg) {

> if (is.inside.functor(arg)) {

> apply.functor(arg, func, arg)

> }

> else {

> func(arg)

> }

Here we assume apply.functor is a method that individual functor classes will define.

Then we can define each like

> each = function(arg) {

> inside = list(items = arg)

> class(inside) = "each"

> inside

> }

> apply.functor.each = function(inside, func, arg) {

> each(lapply(inside$items, function(x) {

> apply.check.functor(func, x)

> }))

> }

> is.inside.functor.each = function(inside) {

> T

> }

And you can see this is working exactly like fmap from say Haskell.

Then add those methods we needed,

> is.inside.functor = function(...) {

> UseMethod("is.inside.functor")

> }

> is.inside.functor.default = function(...) {

> F

> }

> apply.functor = function(...) {

> UseMethod("apply.functor")

> }

And test:

> sum.until = function(n) {

> sum(1:n)

> }

> x = c(1, 2, 3)

> x

1 2 3

> y = apply.check.functor(sum.until, each(x))$items

> y

And they can be nested:

> z = apply.check.functor(round, apply.check.functor(sin, each(x)))$items

> z

And now to make the syntax nicer. Rather than calling apply.check.functor each time, the function being called can do that itself:

> fmap = function(func) {

> params = formals(args(func))

> new.func = function() {

> .args = as.list(environment())

> apply.check.functor(func, .args[[1]])

> }

> formals(new.func) = params

> new.func

> }

> sum.until. = fmap(sum.until)

> y = sum.until.(each(x))$items

> y

(This version of fmap has several technical problems which I'll point out later).

BUT our original example had a function of 2 arguments. If we're going to start handling multiple arguments, we have to answer a few questions:

What do we do if some arguments are inside.functors and others are not?
How do we handle multiple disagreeing inside.functors?

Take the first one. If you really want to imitate English (which I do), it's only inevitable that

> each(x) + 1

should add 1 to each x. So the argument 1 should be "brought in" to the functor.

For the second question, take the example

> each(x) + each(y)

where clearly the intent is to "line up" the corresponding elements of x and y. Actually, to closer follow English you would say

> each(x) + corresponding(y)

Which might be preferable because it is less ambiguous.

More will follow.