January 15, 2016

When working with lazy sequences in Clojure, one bit of advice for avoiding excessive memory use is to avoid unintentionally holding onto the head of a lazy sequence.

Let's mess around with this idea in ClojureScript. In particular, let's work with a gigantic sequence of numbers: `(repeatedly 100000000 rand)`.

This sequence is lazy, and just calling `repeatedly` will not cause it to be generated. With the latest release of ClojureScript, you can verify this. By evaluating

``````(realized? (repeatedly 100000000 rand))
``````

you'll get `false`.

Let's say we want to sum up the numbers in this sequence, and do it in an imperative fashion. Of course, this is not the recommended approach, but it works for the purposes of this post.

Let's define an atom to hold the sum

``````(def sum (atom 0))
``````

and an accumulation function that will add a number to our sum

``````(defn accumulate [x]
(swap! sum + x))
``````

And to sum things up, we are going to do precisely what Stuart Sierra advises against and map our `accumulate` function over the sequence. To actually cause it to force the side effects to occur, we will employ `dorun`:

``````(dorun
(map accumulate
(repeatedly 100000000 rand)))
``````

If you try this, you will notice that it uses a lot of memory. What's going on? I mean, after all, `(doc dorun)` indicates that it “does not retain the head and returns nil.”

If you try this in Clojure, it will work fine.

The underlying problem is that ClojureScript doesn't have locals clearing. See CLJS-705. In short, ClojureScript simply can't honor the docstring for `dorun` because the `coll` argument itself will hold the head. The same problem occurs with `map`. Sorry. What if we actually really really needed to do something like this. Can we? Even without locals clearing?

For sake of simplicity, let's consider the following alternate definitions of `dorun` and `map` that support only the arities we need:

``````(defn dorun*
[coll]
(when (seq coll)
(recur (next coll))))

(defn map*
[f coll]
(lazy-seq
(when-let [s (seq coll)]
(cons (f (first s))
(map* f (rest s))))))
``````

And, you can verify that, with these, things will still behave the same way via:

``````(dorun*
(map* accumulate
(repeatedly 100000000 rand)))
``````

## Letting Go

What we can do is to essentially create a means to pass arguments to functions indirectly, and clear them ourselves. Let's approach that by defining a type that can hold a value, but when dereferenced, clears the value:

``````(deftype Transfer [^:mutable v]
IDeref
(-deref [o]
(let [r v]
(set! v nil)
r)))
``````

Note the use of `^:mutable` meta—this lets us call `set!` when it comes time to clear the value being held by `Transfer`.

Here is an example of it working in a REPL:

``````cljs.user=> (def x (Transfer. 7))
#'cljs.user/x
cljs.user=> @x
7
cljs.user=> @x
nil
``````

So, you get one shot—once you've dereferenced it, grab the value and use it, because it will be cleared from the `deftype` value.

With this, we can define `dorun'` that doesn't hold the head of the lazy sequence passed to it, so long as the lazy sequence argument is wrapped using `Transfer`. It is just like `dorun*` above, but with `Transfer` woven into it:

``````(defn dorun'
[transfer]
(let [coll @transfer]
(when (seq coll)
(recur (Transfer.
(next coll))))))
``````

Likewise, we need to define a `map'` variant of `map*` that employs the same trick.

``````(defn map'
[f transfer]
(lazy-seq
(when-let [s (seq @transfer)]
(cons (f (first s))
(map' f (Transfer.
(rest s)))))))
``````

Now, with these in place, you can do the following

``````(dorun'
(Transfer.
(map' accumulate
(Transfer.
(repeatedly 100000000 rand)))))
``````

and if you watch memory consumption, you'll see that things are OK.

Now of course, the above is tedious and nonstandard. I wouldn't recommend doing it. I think the value of the above excercise is largely in understanding ClojureScript and locals clearing, and perhaps, like I said if you really really really need something like this, the `deftype` and `:^mutable` machinery gives you one way to accomplish this.

Tags: ClojureScript