When I first read about generators in JavaScript, I assumed, “magic.” Reading the spec is interesting but to get to the guts of how they are implemented, I would have to dig into some C/C++ code. When I wanted to learn about how Elixir implements Streams, I found it very easy to get an understanding how one can lazily iterate over a range.
Why did I find it so easy? This is, in part, due to the functional world in which we operate. You change state by calling a function, not by making an assignment. No longer are we thinking in steps but rather in a sequence of operations. This greatly frees our mind to focus on the domain of the problem rather than focusing too much on organizing and taming the domain.
Although the docs provide a good example why composable lazy streams are efficient for large data sets, I figured we could start with a basic example.
stripes = Stream.cycle([:red, :white, :blue])
stripes |> Stream.take(6_000_000) |> Enum.to_list # [:red, :white, :blue, :red, :white, :blue, ...]
Looking at the above code, you may come up with a few ways one can build up a list like this. If you take it at the source code, you will learn pretty quickly how it works. In this article, I will introduce some broader functional programming concepts as well as dive into how one iteration of a stream is implemented.
Stream.unfold/2
Stream.unfold/2
is a cool utility function that is based on a functional idea called anamorphism
— a sequence generated by applying a defined function to the previous result until a terminating condition is received. In fact, the term unfold
is related to a standard operator in functional programming called fold
, a family of higher order functions that is the basis for all functional programming. In the most familiar sense, this can also be replaced by the word reduce
. Stated as defined, it is a function that analyzes a recursive data structure by recombining each analyzed part into a return value. Here is a paper providing explanations and proofs of the power of fold
.
With this background, we can simply state Stream.unfold/2
analyzes a recursive data structure and generates successive values until told not to, returning an Enumerable type. unfold/2
also is the basis for other methods in the Stream
module like iterate/2
, interval/1
and cycle/1
. Let’s look at the implementation in parts (as of Elixir 1.8.1) by generating a Fibonacci sequence.
Side note — if you are curious about these functions, iex
is one of the most useful tools in our belt when looking for what a specific function does. If you have Elixir installed on your machine, type iex
on the command line, and you can find all sorts of documentation. For example, by typing h Enum.
, hit tab and see which methods exist on the Enum
module. Once you want to dig a little further, you can see what the docstring outputs with h Enum.dedup
. Another fun note is to just hit tab
when you enter the iex
shell, and you can explore all of the Kernel methods that exist. Alright, back to the code.
Setup of a Stream with a fibonacci sequence
# generates a fibonacci sequence
# the details of this aren't super helpful. Just here for some setup
Stream.unfold({0, 1}, fn
{a, b} -> {a, {b, a + b}} # return a tuple, with `a` being returned to the consumer and {b, a + b} is the state for the next iteration
end)
- From
unfold/2
below, we return an anonymous function after passing in an initial accumulator and a function. We will get back to the anonymous function later but take note of the value placeholders&1
and&2
. As we will see, this will eventually be our new accumulator and a function to generate each successive value.
@spec unfold(acc, (acc -> {element, acc} | nil)) :: Enumerable.t()
def unfold(next_acc, next_fun) do
&do_unfold(next_acc, next_fun, &1, &2)
end
This anonymous function can be rewritten as
fn one, two ->
do_unfold(next_acc, next_fun, one, two)
end
- Jumping into
do_unfold/4
, you can see our end state isnil
, so we stop execution with a:done
instruction. If I told you these instructions have no special meaning outside of your program other than to inform state for the next operation, then I would be lying. In fact, no longer are we limited to just returning an accumulator. In Elixir, we can return:cont
,:halt
and:suspend
. End state instructions include:done
,:halted
and:suspended
and are utilized in Elixir. This is important knowledge for other scenariosunfold/2
can cover as noted at the end.
defp do_unfold(next_acc, next_fun, {:cont, acc}, fun) do
case next_fun.(next_acc) do
nil -> {:done, acc}
{v, next_acc} -> do_unfold(next_acc, next_fun, fun.(v, acc), fun)
end
end
- Also notice we recursively apply
do_unfold/4
above. This is where things get interesting. If we were to follow the path of generating a Fibonacci sequence to this point, we should remember that we haven’t actually executed anything. Computations are only performed when you call a method from theEnum
module. Streams are efficient because we delay executing work until the last possible step, side stepping multiple iterations over a list as found in similarEnum
implementations. In this case, we will callEnum.to_list/1
.
Stream.unfold({0, 1}, fn
{a, b} -> {a, {b, a + b}}
end)
|> Enum.to_list()
Execution of a Stream with the Enum module
- Now onto how streams actually execute. It isn’t magic but some good ol’ functional recursion.
Let’s evaluate the anonymous function returned from Stream.unfold/2
that will be passed to Enum.to_list/1
. The Stream
documentation states it like so.
Note the functions in this module are guaranteed to return enumerables.
Since enumerables can have different shapes (structs, anonymous functions,
and so on), the functions in this module may return any of those shapes
and this may change at any time. For example, a function that today
returns an anonymous function may return a struct in future releases.
An anonymous function is technically an enumerable. After we perform the initial operation, we get a result that we can pass to to_list/1
.
result = #Function<64.66371359/2 in Stream.unfold/2>
enum = Enum.to_list(result)
So we got back a function in Stream.unfold/2
with an arity of 2. Only when we actually call Enum.to_list/1
do we calculate the next Fibonacci number from the previous number. As we noted before, fold
is a general functional programming term and reduce
is the basis for all this work. Coincidentally, by tracking to_list(stream)
, we eventually come across a method reduce
. You can see the stream’s final resting place below.
Let’s work our way backwards starting at the end of the Enum.to_list/1
chain with Enumerable.reduce/3
. Our anonymous function returned earlier from Stream.unfold/2
is function
in this context.
defimpl Enumerable, for: Function do
...
def reduce(function, acc, fun) when is_function(function, 2), do: function.(acc, fun) # fun = `fn x, acc -> {:cont, build_acc_function.(x, acc)}
...
end
What is fun
? Let’s jump back to what called Enumerable.reduce/3
. fun
is a function that returns {:cont, build_acc_function.(x, acc)}
when called. Since functions in Elixir are first class citizens, it is simply passed through via function.(acc, fun)
above.
defp reduce_enumerable(enumerable, acc \\ [], build_acc_function \\ &[&1 | &2]) do
Enumerable.reduce(enumerable, {:cont, acc}, fn x, acc -> {:cont, build_acc_function.(x, acc)} end) |> elem(1)
end
- Note I changed the 3rd argument to
build_acc_function
and added default values to reduce confusion and show what initial state this method is populated with.
When do we actually call build_acc_function
from the reduce method above? Let’s go back to do_unfold/4
.
defp do_unfold(next_acc, next_fun, {:cont, acc}, fun) do
case next_fun.(next_acc) do
nil -> {:done, acc}
{v, next_acc} -> do_unfold(next_acc, next_fun, fun.(v, acc), fun)
end
end
In this scope, the 3rd and 4th arguments were the value placeholders &1
and &2
passed from Enumerable.reduce/3
. {:cont, acc}
is &1
, the accumulator that is holding our state, while fun
(&2
) is our passed in function from reduce_enumerable/3
and returns {:cont, build_acc_function.(x, acc)}
with build_acc_function
being &[&1 | &2]
. This is our general way of adding new items to the list with v
being the new item.
fun = &[&1 | &2]
fun.(1, [2])
# [1, 2]
If we are lucky, we add the new value to the accumulator and return {:cont, new_acc}
, calling do_unfold/4
again. This is how the list is built up from Stream.unfold/2
. To state another way, by passing our initial anonymous do_unfold/4
function to Enum.to_list/1
, we eventually just jump back to the Stream
module and and recursively execute do_unfold/3
until we return nil
, our :done
state. In this case, we should use Enum.take(10)
or else it will infinitely recurse. Cool, right?
Stream.unfold({0, 1}, fn
{a, b} -> {a, {b, a + b}}
end)
|> Stream.map(&IO.inspect(&1))
|> Enum.take(10)
We only generate the first 10 results we need.
- Some alternative cases are provided via instructions if the function you provide returns
:suspend
or:halt
.
defp do_unfold(next_acc, next_fun, {:suspend, acc}, fun) do
{:suspended, acc, &do_unfold(next_acc, next_fun, &1, fun)}
end
defp do_unfold(_next_acc, _next_fun, {:halt, acc}, _fun) do
{:halted, acc}
end
Recap
So understanding Elixir streams isn’t hard; it just needs a bit of function logic. Of course, the inner workings may change, but in the meantime, I find it useful to understand these patterns and recognize that it isn’t anything I can’t learn.
To recap, fold
(aka reduce
) is the building block of many higher order functions in functional programming. Stream.unfold/2
is an interesting function that allows us to solve problems that require generating sequential values based on logic we give it. We learned about streams and how they work — via functions and recursion. Surprised, right? However, Stream.unfold/2
is implemented differently than Stream.map/2
. In follow up posts, I will dig into these differences, as well as Enum.sort/2
and Enum.sort_by
, both of which bring lots of useful knowledge. Exciting Elixir posts ahead so keep an eye out.
DockYard is a digital product agency offering exceptional strategy, design, full stack engineering, web app development, custom software, Ember, Elixir, and Phoenix services, consulting, and training.