Using mmap to share data between processes

As part of my recent experiments with mmap¹ I have learned how to share data between processes using a memory-mapped file. Here I’ll show how to do it between two independent Python processes, but the same principles apply to any programming language.

Create a file to store the data

I will use the tempfile module so that the file gets deleted after the script finishes. I prefer using this when learning or testing new things, otherwise my folders end up filled with random files. You can also use another location instead of the temporary file. We need to know where this file is stored. We can use the .name attribute of the temporary file to get its path. Note that this will change every time you re-run the script because the files are temporary.

import tempfile
import mmap

fd = tempfile.NamedTemporaryFile()
print(fd.name)

# '/var/folders/v5/jw7h15p16b929g4gkx5bml040000gn/T/tmptdqcitxl'

# To use a custom location:
# fd = open("shared.dat", "wb+")

Now we need to make the file big enough to fold our data. Let’s use size = 1000 bytes. We need to:

Open the file in wb mode.
Seek 1000th byte. This should be position 999 (1000 - 1) because we start counting at zero.
Write a single byte so that the file is extended up to that size.
.flush() to ensure the data is written.
Create a mmap object from the file. We will mmap the full file contents (again size = 1000)

size = 1000

fd.seek(size - 1)
fd.write(b"1")
fd.flush()

m = mmap.mmap(fd.fileno(), size)

Now that we have the mmap object, we can write data as if it was a bytes buffer. Here I’m writing random data, but this could be serialized structs or anything that can be converted to bytes.

m[:10] = b"1"*10

In a different process/terminal/REPL

Now we will open a different Python process (new shell, new interpreter). We just need the file path we got in the previous step. It’s important that we open the file in ab+ mode (append bytes). If we open it in write mode as we did before, the file will get truncated and we will lose the data.

import mmap
import io

# Use "ab+" mode! if the mode is "wb+" the file will be truncated and we'll
# need to seek + write + flush again

fd = io.open("/var/folders/v5/jw7h15p16b929g4gkx5bml040000gn/T/tmptdqcitxl", "ab+")

# Or if using a custom location:
# fd = open("shared.dat", "ab+")

size = 1000

m = mmap.mmap(fd.fileno(), size)

We have a new mmap object in a different process. We can now get the bytes we wrote from the other process.

print(m[:10])
# b'1111111111'

Writing to disk

Since we are working with a file, the data we write will be written to disk. By default, the Operating System will decide when to persist the data. We can use the .flush() get some control over this. When using the .flush() method, the msync function runs (source). The Python implementation syncs the file with the MS_SYNC flag (synchronous). The method acceps an offset and size parameters to sync only parts of the file, those are not present, the full file is synced. Take into account that this doesn’t ensure the data is written and could return errors that you need to handle. If you need stronger durability guarantees, it’s probably better that you just use SQLite to store the data.

Use cases

This can be used to share data/memory between processes. Python comes with some utilities to do this more easily. But this approach can be used regardless of the programming language that the program uses. It also lets you persist the data to disk after the programs terminate (except if the memory-mapped file is stored in a temporary location, as I did in this example).

Full code

Process 1:

import tempfile
import mmap

fd = tempfile.NamedTemporaryFile()

print(fd.name)

size = 1000

fd.seek(size - 1)
fd.write(b"1")
fd.flush()

m = mmap.mmap(fd.fileno(), size)

m[:10] = b"1" * 10

Process 2:

import mmap
import io

# Use "ab+" mode! if the mode is "wb+" the file will be truncated and we'll
# need to seek + write + flush again

fd = io.open("/var/folders/v5/jw7h15p16b929g4gkx5bml040000gn/T/tmptdqcitxl", "ab+")


size = 1000

m = mmap.mmap(fd.fileno(), size)

print(m[:10])
# b'1111111111'

Running regexes on memory-mapped files ↩︎