With Elixir, you get the scalability and stability power you need for lasting success. And when you partner with DockYard, you get leaders in Elixir ready to execute your vision. Book a free consult to learn more.
Introduction
Elixir has quite a few different ways of representing collections of data: lists for ordered collections, MapSets for set manipulation, Maps and Keyword Lists for key-value pair association, and so on. However, there’s another important data structure: BEAM binaries.
A bitstring is the more general version of the binary, which are, respectively, a sequence of bits or a sequence of bytes. Much like lists, they are ordered. As a bonus, binaries also have known sizes and provide some powerful tools for writing and reading data from them.
How to Create a Binary
In short, Elixir provides the <<>>
operator for creating a binary. There are a few ways of creating one, such as:
iex(1)> x = 123
123
iex(2)> <<x::8>>
"{"
iex(3)> <<x::7>>
<<123::size(7)>>
iex(4)> <<0::1, x::7>>
"{"
In this first example, we already notice something strange–the first command in iex(2)
returned a string! Elixir strings
are actually binaries under the hood. x::8
means that we’re writing 123
as an eight-bit number into the bitstring.
Likewise <<x::7>>
means that we’re representing 123
as a seven-bit bitstring, which isn`t a string.
The last line, however, adds the missing leading zero, which then returns the same as the original line.
There are a few different ways of embedding values in a binary, like in the example below.
For a more in-depth explanation of valid types, the official documentation for <<...>>
is pretty extensive.
The example also introduces a more extensive syntax for matching binary elements, which allows us to extract
data in a more structured way from binaries.
iex(1)> <<?J::utf8, ?o::utf8, ?s::utf8, 50089::utf16>>
"José"
iex(2)> embed = <<1024.25::float-32-little>>
<<0, 8, 128, 68>>
iex(13)> <<extracted::float-32-little>> = embed
<<0, 8, 128, 68>>
iex(14)> extracted
1024.25
The Basics of Binary Pattern Matching
Let’s take a look at the following example:
iex> binary = <<1::1, 2::3, 3::4, 4::8>>
<<163, 4>>
The statement is encoding four numbers into a bitstring in such a way that they only occupy two bytes. We can extract them back by using pattern matching with the same size specifications:
iex> <<a::1, b::3, c::4, d::8>> = binary
iex> a
1
iex> b
2
iex> c
3
iex> d
4
Now for a more complete example, let’s encode a float:
<<1024.25::float-32-little>>
This statement says that we want to write a number (1024.25
) with the float
type encoding,
using 32
bits as the bitsize and using little endian encoding. As a comparison, here’s the
same number written in both little and big endian:
iex> <<1024.25::float-32-little>>
<<0, 8, 128, 68>>
iex> <<1024.25::float-32-big>>
<<68, 128, 8, 0>>
Finally, let’s look into binary comprehension and recursion. Binary comprehension is a way for us to iterate over a binary, extracting and manipulating data.
binary = "Elixir"
for <<char::utf8 <- binary>>, into: <<>> do
case_changed =
cond do
char in ?a..?z -> char + ?A - ?a
char in ?A..?Z -> char + ?a - ?A
true -> char
end
<<case_changed::utf8>>
end
# returns "eLIXIR"
In this more complex example, we’re changing the case of all characters in the string and splicing the string back together.
All features of the common for
comprehension work here.
For recursion, we just need to treat pattern matching as we normally do. The following example encodes and then decodes a key-length-value encoded binary.
defmodule KLVCodec do
@key_bits 8
@length_bits 32
@keys [:first_name, :last_name]
def encode(data) do
for %{key: key, value: value} <- data, into: <<>> do
encoded = :erlang.term_to_binary(value)
encoded_size = byte_size(encoded)
<<to_key_code(key)::@key_bits, encoded_size::@length_bits, encoded::bytes-size(encoded_size)>>
end
end
defp to_key_code(key) do
Enum.find_index(@keys, & &1 == key)
end
def decode(encoded, acc \\ [])
def decode(<<>>, acc), do: Enum.reverse(acc)
def decode(<<key_code::@key_bits, length::@length_bits, encoded::bytes-size(length), rest::bitstring>>, acc) do
key = from_key_code(key_code)
data = :erlang.binary_to_term(encoded)
decode(rest, [%{key: key, value: data} | acc])
end
defp from_key_code(idx), do: Enum.at(@keys, idx)
end
iex(1)> encoded = KLVCodec.encode([%{key: :first_name, value: "Paulo"}, %{key: :last_name, value: "Valente"}])
<<0, 0, 0, 0, 11, 131, 109, 0, 0, 0, 5, 80, 97, 117, 108, 111, 1, 0, 0, 0, 13, 131, 109, 0, 0, 0, 7, 86, 97, 108, 101, 110, 116, 101>>
iex(2)> KLVCodec.decode(encoded)
[%{value: "Paulo", key: :first_name}, %{value: "Valente", key: :last_name}]
Although this is a more complex example, it really showcases the power of binary matching.
A special highlight is the following statement, which uses one of the matched fields (length
) as metadata for
the next matched field (bytes-size(length)
):
<<key_code::@key_bits, length::@length_bits, encoded::bytes-size(length), rest::bitstring>>
Conclusion
Elixir’s binary matching is a powerful tool for efficient data processing, offering both simplicity and performance. It excels in various applications, from basic pattern matching to complex data encoding and decoding. This versatility makes it an essential skill for Elixir developers, opening doors to innovative solutions in data-intensive tasks.