Thumbelina

Re-re-re-introducing thumbelina ⚗️🧪🔮.

Thumbelina was a project I started thinking about sometime in late 2022 when I was a wee little inexperienced with elixir. The problem I set out to solve was simple, I wanted to manipulate images using elixir, specifically I wanted to generate thumbnails. Seems straightfoward no?

As it turns out this is the kind of thing the language isn’t suited for – most practical roads lead to aws s3 and adding imagemagick to your dockerfile. For production that is exactly what I did, but I kept wondering what was the “best” solution?

I stumbled upon reading about discord’s SortedSet and so began my journey to challenge myself by learning (async) rust, rustler, some systems concepts and tinkering with extending and interfacing with BEAM internals.

Time flies by, off and on whenever I’d get bored in my free time I’d incrementally built out small parts of the library that does image processing – but I never finished it…until now.

What was the outcome?

It was/is a bad idea:

Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and other already-compressed data.

As it turns out popular binary formats like .png, .jpeg and vectorized formats like .svg are fairly optimized. Who knew? not me. I felt really dumb reading that.

If you’d like to learn how to do it anyway read on!

Boring High Level Concepts

One approach to using programs written in other languages is opening a Port, mogrify an elixir ImageMagick wrapper for example leverages System.cmd, which uses a unix pipe to communicate with the ImageMagick binary via streams of bytes in a different OS process. There are other high level libraries that make this a viable practical approach.

Or to use a similar mechanism and implement the functionality yourself plugging into the VM, known as “linked-in drivers” or as a “hidden node” via a network pipe such as a TCP socket, the advantage of doing this is you get fault tolerant mechanisms like supervisors. The second node can be in go, rust or elixir itself. The job scheduler Oban has a way to send workers over to other servers too if you don’t wanna manage the networking yourself or want a db intermediary. You can mix and match. This is a good overview of the solution space.

I used the Natively Implemented Function(NIF) C ABI. This is a managed space outside the BEAM ie rustler which implements this binary interface and provides high level types in lovely rust.

In the Weeds

These subroutines are expected to be pre-emptively scheduled in <1ms and are appropriate for synchronous operations such as short CPU burst computations and custom data structures, you don’t want to copy around the runtime ala sorted set, so we don’t wait but exit early, here’s how making a thumbnail works:

# you can also read the binary from a network
# but let's keep it simple by reading from disk
{:ok, image} = Thumbelina.open("./path_to_image.jpg") 
width = 50.0
height = 50.0
destination = self()
Thumbelina.Internal.cast(:thumbnail, destination, image.bytes, image.extension, width, height)
iex(1) => :ok

This near instantly returns. Tokio first lazily inits on a single worker thread:

static TOKIO: Lazy<Runtime> = Lazy::new(|| {
    Builder::new_multi_thread()
        .worker_threads(1)
        .build()
        .expect("Thumbelina.Internal - no runtime!")
});

now we can start scheduling on the first invocation of this subroutine since there is no main macro to expand in this binary, this is afterall an embed that’s dynamically linked:

// Asynchronously spawn a green thread on one physical thread
// that's to be managed on the tokio runtime.
pub fn spawn<T>(task: T) -> JoinHandle<T::Output>
where
    T: Future + Send + 'static,
    T::Output: Send + 'static,
{
    TOKIO.spawn(task)
}

It gets busy in a seperate OS thread space unbeknownst to the BEAM and sends a message when it’s done to destination with a result, note that you can also do this without tokio with operating system threads, the worker looks like:

// move ownership of the smart pointer to the image binary 
// outside the lifetime of the NIF sync call
if let Some(buffer) = binary.to_owned() {
    let buffered_lock = Arc::new(RwLock::new(buffer));

    task::spawn(async move {
        let buffer = Arc::clone(&buffered_lock);
        let buffer = buffer.read().unwrap()
        let result = operation::perform(operation, width, height, extension, &buffer).unwrap();

        Ok(image) => env.send_and_clear(&pid, move |env| {
            Success {op: thumbelina::atoms::ok(), result} .encode(env)})
        });
}

in our local process which is self() here we should eventually get:

receive do
  {:ok, result} -> IO.inspect(result)
end

The magic of the BEAM is really in easy distributed networking, say we try to send over our destination as the pid of a process on a remote server – this seems possible because elixir/erlang has location transparency of pids/processes. However the NIF ABI is tightly coupled to the internals of the host’s runtime and behaviour and therefore only allows LocalPid. However it’s entirely possible to rebroadcast the result of this operation in a genserver.

GenServer.start_link(__MODULE__, [], name: {:global, __MODULE__})

This idea can perhaps be expanded on, maybe you want to model this in a producer-consumer pipeline? using GenStage?

[A] -> [B] -> [C]

A producer continually ingests data from data lake 
B producer consumer (process thumbnail using message passing)
C consumer await result out do stuff with output

You may be wondering doesn’t reading entire large bytes of images into memory lead to sudden spikes in memory? You’re right. I considered providing a stream/yeilding between the runtime and the C ABI but the cost/benefit doesn’t seem worth it.

In theory by inheriting the complexity of owning such a system end to end you can really tune peformance by mixing and matching cool features. BEAM for simple concurrency and distributed networking and rust for it’s type system, low-level memory safety and speed. This is still true, but in practice this idea falls flat. The most value in this domain from being able to re-use bindings to highly optimised C/Rust libraries like Vix.

Going forward

I’m not sure I’m going to continue to expand on this idea, and will mostly be moving on. A nod to projects I think are interesting applications/usecases in the wild:

  1. wasmex - which provides a low-level interface to wasm/wasi via wasmtime.
  2. explorer which brings dataframe processing to elixir via polar-rs.
  3. Tigerbeetlex a database client, it’s pretty much the same ideas explained here but is written in zig, handles interacting with the C binary ABI and implements the TigerBeetle client spec by embedding the Zig client.