ClojureScript Head Holding

January 15, 2016

When working with lazy sequences in Clojure, one bit of advice for avoiding excessive memory use is to avoid unintentionally holding onto the head of a lazy sequence.

Let's mess around with this idea in ClojureScript. In particular, let's work with a gigantic sequence of numbers: (repeatedly 100000000 rand).

This sequence is lazy, and just calling repeatedly will not cause it to be generated. With the latest release of ClojureScript, you can verify this. By evaluating

(realized? (repeatedly 100000000 rand))

you'll get false.

Let's say we want to sum up the numbers in this sequence, and do it in an imperative fashion. Of course, this is not the recommended approach, but it works for the purposes of this post.

Let's define an atom to hold the sum

(def sum (atom 0))

and an accumulation function that will add a number to our sum

(defn accumulate [x]
  (swap! sum + x))

And to sum things up, we are going to do precisely what Stuart Sierra advises against and map our accumulate function over the sequence. To actually cause it to force the side effects to occur, we will employ dorun:

  (map accumulate 
    (repeatedly 100000000 rand)))

If you try this, you will notice that it uses a lot of memory. What's going on? I mean, after all, (doc dorun) indicates that it “does not retain the head and returns nil.”

If you try this in Clojure, it will work fine.

The underlying problem is that ClojureScript doesn't have locals clearing. See CLJS-705. In short, ClojureScript simply can't honor the docstring for dorun because the coll argument itself will hold the head. The same problem occurs with map. Sorry.

What if we actually really really needed to do something like this. Can we? Even without locals clearing?

For sake of simplicity, let's consider the following alternate definitions of dorun and map that support only the arities we need:

(defn dorun*
  (when (seq coll)
    (recur (next coll))))
(defn map*
  [f coll]
    (when-let [s (seq coll)]
      (cons (f (first s)) 
        (map* f (rest s))))))

And, you can verify that, with these, things will still behave the same way via:

  (map* accumulate 
    (repeatedly 100000000 rand)))

Letting Go

What we can do is to essentially create a means to pass arguments to functions indirectly, and clear them ourselves. Let's approach that by defining a type that can hold a value, but when dereferenced, clears the value:

(deftype Transfer [^:mutable v]
   (-deref [o]
     (let [r v]
       (set! v nil)

Note the use of ^:mutable meta—this lets us call set! when it comes time to clear the value being held by Transfer.

Here is an example of it working in a REPL:

cljs.user=> (def x (Transfer. 7))
cljs.user=> @x
cljs.user=> @x

So, you get one shot—once you've dereferenced it, grab the value and use it, because it will be cleared from the deftype value.

With this, we can define dorun' that doesn't hold the head of the lazy sequence passed to it, so long as the lazy sequence argument is wrapped using Transfer. It is just like dorun* above, but with Transfer woven into it:

(defn dorun'
  (let [coll @transfer]
    (when (seq coll)
      (recur (Transfer. 
               (next coll))))))

Likewise, we need to define a map' variant of map* that employs the same trick.

(defn map'
  [f transfer]
    (when-let [s (seq @transfer)]
      (cons (f (first s)) 
        (map' f (Transfer. 
                  (rest s)))))))

Now, with these in place, you can do the following

    (map' accumulate 
        (repeatedly 100000000 rand)))))

and if you watch memory consumption, you'll see that things are OK.

Now of course, the above is tedious and nonstandard. I wouldn't recommend doing it. I think the value of the above excercise is largely in understanding ClojureScript and locals clearing, and perhaps, like I said if you really really really need something like this, the deftype and :^mutable machinery gives you one way to accomplish this.

Tags: ClojureScript