Using mmap to share data between processes
Table of contents
As part of my recent experiments with mmap1 I have learned how to share data between processes using a memory-mapped file. Here I’ll show how to do it between two independent Python processes, but the same principles apply to any programming language.
Create a file to store the data
I will use the tempfile
module so that the file gets deleted after the script finishes. I prefer using this when learning or testing new things, otherwise my folders end up filled with random files. You can also use another location instead of the temporary file.
We need to know where this file is stored. We can use the .name
attribute of the temporary file to get its path. Note that this will change every time you re-run the script because the files are temporary.
import tempfile
import mmap
fd = tempfile.NamedTemporaryFile()
print(fd.name)
# '/var/folders/v5/jw7h15p16b929g4gkx5bml040000gn/T/tmptdqcitxl'
# To use a custom location:
# fd = open("shared.dat", "wb+")
Now we need to make the file big enough to fold our data. Let’s use size = 1000
bytes. We need to:
- Open the file in
wb
mode. - Seek 1000th byte. This should be position 999 (1000 - 1) because we start counting at zero.
- Write a single byte so that the file is extended up to that size.
.flush()
to ensure the data is written.- Create a
mmap
object from the file. We will mmap the full file contents (againsize = 1000
)
size = 1000
fd.seek(size - 1)
fd.write(b"1")
fd.flush()
m = mmap.mmap(fd.fileno(), size)
Now that we have the mmap object, we can write data as if it was a bytes buffer. Here I’m writing random data, but this could be serialized structs or anything that can be converted to bytes.
m[:10] = b"1"*10
In a different process/terminal/REPL
Now we will open a different Python process (new shell, new interpreter). We just need the file path we got in the previous step. It’s important that we open the file in ab+
mode (append bytes). If we open it in write mode as we did before, the file will get truncated and we will lose the data.
import mmap
import io
# Use "ab+" mode! if the mode is "wb+" the file will be truncated and we'll
# need to seek + write + flush again
fd = io.open("/var/folders/v5/jw7h15p16b929g4gkx5bml040000gn/T/tmptdqcitxl", "ab+")
# Or if using a custom location:
# fd = open("shared.dat", "ab+")
size = 1000
m = mmap.mmap(fd.fileno(), size)
We have a new mmap object in a different process. We can now get the bytes we wrote from the other process.
print(m[:10])
# b'1111111111'
Writing to disk
Since we are working with a file, the data we write will be written to disk. By default, the Operating System will decide when to persist the data. We can use the .flush()
get some control over this. When using the .flush()
method, the msync
function runs (source). The Python implementation syncs the file with the MS_SYNC
flag (synchronous). The method acceps an offset
and size
parameters to sync only parts of the file, those are not present, the full file is synced. Take into account that this doesn’t ensure the data is written and could return errors that you need to handle. If you need stronger durability guarantees, it’s probably better that you just use SQLite to store the data.
Use cases
This can be used to share data/memory between processes. Python comes with some utilities to do this more easily. But this approach can be used regardless of the programming language that the program uses. It also lets you persist the data to disk after the programs terminate (except if the memory-mapped file is stored in a temporary location, as I did in this example).
Full code
Process 1:
import tempfile
import mmap
fd = tempfile.NamedTemporaryFile()
print(fd.name)
size = 1000
fd.seek(size - 1)
fd.write(b"1")
fd.flush()
m = mmap.mmap(fd.fileno(), size)
m[:10] = b"1" * 10
Process 2:
import mmap
import io
# Use "ab+" mode! if the mode is "wb+" the file will be truncated and we'll
# need to seek + write + flush again
fd = io.open("/var/folders/v5/jw7h15p16b929g4gkx5bml040000gn/T/tmptdqcitxl", "ab+")
size = 1000
m = mmap.mmap(fd.fileno(), size)
print(m[:10])
# b'1111111111'