rand[om]

rand[om]

med ∩ ml

Replacing FFI with a CLI

I recently read a post from vercel about porting turborepo to rust. In that post, they mention calling a go binary from the rust code instead of having to deal with FFI and C type compatibility.

I wrote a post called “Use a subprocess instead of a dependency”, which follows a similar philosophy, although applied to something different. Here are some extra notes related to the vercel post. As an example, I’ll use a Python app that wants to call some Rust code, but the idea could apply to any combination.

Using a subprocess

For example, let’s say you have a Rust binary that you want to call from Python. You could spend some time with PyO3 and build an integration layer between Rust and Python1. Or you can just ship the Rust binary and call it using a subprocess. Some of the advantages I see in this approach are:

Decoupling between the binary and the Python app

You can build a new version of the Rust binary, rsync it to the VM and start using it. As long as the calling arguments don’t change, you can decouple the development of the Rust functions from the Python app.

No integration layers

The cool thing about a CLI is that you can call it from Python, go, a shell script, etc. You don’t need to worry about building “FFI layers”, you can call your binary from any programming language.

Serialization

This is another point mentioned in the vercel post:

Our first takeaway is that serialization formats are very useful for interoperability. By serializing to JSON, a format with robust support in both Go and Rust, we were able to minimize our FFI surface area, and avoid a whole class of cross-platform, cross-language bugs. When we had to switch from a single, linked binary to two binaries, we were able to do so with relative ease because our FFI surface area was so small.

The tradeoff here is that serialization and deserialization is slow. You can only depend on this technique if either you know your serialized payloads will be small or you don’t care about the performance hit for your use case.

I find this quite cool. Instead of maintaining arguments/types at the FFI layer, the functions only have a single argument, a JSON string. Your Python program can still use regular function arguments and types, same for the Rust binary. But at the “communication layer”, the Python app outputs a single JSON blob, and the Rust binary has an entry point that just receives a single JSON blob as an argument. For example, here is where it happens in the turborepo example.

Drawbacks

This approach adds serialization overhead. Replacing a function call with a subprocess also comes with the extra cost of starting the process, which can vary a lot depending on the programming language used and the overall performance of the CLI you are calling. Overall, these are just more factors to consider (and measure!) when deciding what to do. This post was mostly a collection of thoughts about the approach.


  1. This can also be taken as a special case because there are many great tools to make this particular integration a lot easier: maturin, rustimport, etc. ↩︎