Choosing the Right In-memory Storage Solution (Part 2)

Stop losing users to your competitors. With Elixir, you get a fast, reliable digital product that keeps your user base loyal. Book a free consult to learn more.

This is the second in a series. Read part 1 here.

Reads Only

For our first comparison of the in-memory storage solutions, we’ll be taking a look at the read-only scenario: data is initialized at application startup, and it never changes.

We tested a variety of data sizes, from 10 rows all the way to 10,000,000 rows. For the most part, the results are the same regardless:

Storing Data in a Module is by Far the Fastest Solution

The most complicated solution is also the most performant. If you’re comfortable with metaprogramming, and want the absolute fastest read time possible, look no further.

✅ Maximum read speed

❌ Requires writing implementation boilerplate with metaprogramming
❌ Data initialization can take several seconds for larger datasets (10,000+ rows)
❌ O(log n) read time complexity*

*Note: Any solution which stores data as a Map will have logarithmic read time complexity. However, it doesn’t have a major impact on performance comparisons because the quantity of data we store in memory is usually low enough to be negligible. It would take hundreds of millions of rows to make this approach lose its #1 position in performance, and likely other issues would emerge before reaching that point (as we’ll discuss later).

:persistent_term is a Close Second

Reads are roughly 2x slower than the Module approach, but already has an API ready out-of-the-box so you don’t have to write your own implementation.

✅ Excellent read speed
✅ Simple get/put API

❌ O(log n) read time complexity

A Public :ets Table is a Solid Choice

Compared to the previous two options, :ets doesn’t boast the same level of performance (reads are ~4x slower than :persistent_term and ~9x slower than Module), but does have some noteworthy advantages:

✅ Comes with some query logic out-of-the-box if you need more than just primary key lookups
✅ O(1) read time complexity

❌ Not quite as fast as other options

GenServer Slows You Down

Whether you’re storing a Map in the GenServer’s state, or using it to manage an :ets table, you’re going to slow things down quite a bit (10x slower than a public :ets table, 50-100x slower than Module). The main benefit of a GenServer is serializability to prevent race conditions as the data is updated, but we don’t need that in a read-only scenario.

❌ Slowest read times

Working with 1,000,000+ Rows

Generally when storing data in-memory, we’re not dealing with large quantities. Since RAM is much more expensive than disk space, big datasets are usually written to disk, and a small subset is kept in-memory as a cache. But if you have a need to store more than a million rows of data in-memory, note that any Map-based implementation (Module, :persistent_term, or GenServer) will not only slow down as the dataset grows, but also require the BEAM to allocate more data to a single literal. This is something that does have a hard limit, and the BEAM will crash if the limit is exceeded. There are ways around this, such as increasing the limit, or splitting the data into multiple stores, but the best approach is generally to go with :ets, which is a perfect fit for large datasets.

Benchmarks

Tiny (10 rows)

Module            - 129_858_000 / sec  
:persistent_term  -  47_387_000 / sec  (2.5x slower)  
:ets (async)      -  10_075_000 / sec  (13x slower)  
GenServer (Map)   -     971_000 / sec  (133x slower)  
:ets (serialized) -     830_000 / sec  (156x slower)

Small (100 rows)

Module            - 87_038_000 / sec  
:persistent_term  - 42_949_000 / sec  (2x slower)  
:ets (async)      -  9_554_000 / sec  (9x slower)  
GenServer (Map)   -   969_000 / sec  (90x slower)  
:ets (serialized) -   812_000 / sec  (107x slower)

Medium (1,000 rows)

Module            - 82_485_000 / sec  
:persistent_term  - 41_893_000 / sec  (2x slower)  
:ets (async)      -  8_954_000 / sec  (9x slower)  
GenServer (Map)   -    924_000 / sec  (90x slower)  
:ets (serialized) -    822_000 / sec  (100x slower)

Large (10,000 rows)

Module            - 75_431_000 / sec  
:persistent_term  - 32_719_000 / sec  (2x slower)  
:ets (async)      -  8_488_000 / sec  (9x slower)  
GenServer (Map)   -    947_000 / sec  (80x slower)  
:ets (serialized) -    816_000 / sec  (92x slower)

X-Large (100,000 rows)

Module            - 65_319_000 / sec  
:persistent_term  - 33_936_000 / sec  (2x slower)  
:ets (async)      -  8_326_000 / sec  (8x slower)  
GenServer (Map)   -    934_000 / sec  (70x slower)  
:ets (serialized) -    818_000 / sec  (80x slower)

Huge (1,000,000 rows)

Module            - 47_741_000 / sec  
:persistent_term  - 29_670_000 / sec  (1.5x slower)  
:ets (async)      -  8_308_000 / sec  (6x slower)  
GenServer (Map)   -    909_000 / sec  (52x slower)  
:ets (serialized) -    820_000 / sec  (58x slower)

X-Huge (10,000,000 rows)

:ets (async)      -  8_239_000 / sec  
:ets (serialized) -    811_000 / sec  (10x slower)

Note that the read times with these massive quantities are still roughly the same as when we tested smaller datasets!

Up Next…

Keep an eye out for Part 3, the final segment of our series, where we’ll investigate the performance implications of making updates to your in-memory data.