Using mmap to share data between processes
Table of contents
As part of my recent experiments with mmap1 I have learned how to share data between processes using a memory-mapped file. Here I’ll show how to do it between two independent Python processes, but the same principles apply to any programming language.
Create a file to store the data
I will use the
tempfile module so that the file gets deleted after the script finishes. I prefer using this when learning or testing new things, otherwise my folders end up filled with random files. You can also use another location instead of the temporary file.
We need to know where this file is stored. We can use the
.name attribute of the temporary file to get its path. Note that this will change every time you re-run the script because the files are temporary.
import tempfile import mmap fd = tempfile.NamedTemporaryFile() print(fd.name) # '/var/folders/v5/jw7h15p16b929g4gkx5bml040000gn/T/tmptdqcitxl' # To use a custom location: # fd = open("shared.dat", "wb+")
Now we need to make the file big enough to fold our data. Let’s use
size = 1000 bytes. We need to:
- Open the file in
- Seek 1000th byte. This should be position 999 (1000 - 1) because we start counting at zero.
- Write a single byte so that the file is extended up to that size.
.flush()to ensure the data is written.
- Create a
mmapobject from the file. We will mmap the full file contents (again
size = 1000)
size = 1000 fd.seek(size - 1) fd.write(b"1") fd.flush() m = mmap.mmap(fd.fileno(), size)
Now that we have the mmap object, we can write data as if it was a bytes buffer. Here I’m writing random data, but this could be serialized structs or anything that can be converted to bytes.
m[:10] = b"1"*10
In a different process/terminal/REPL
Now we will open a different Python process (new shell, new interpreter). We just need the file path we got in the previous step. It’s important that we open the file in
ab+ mode (append bytes). If we open it in write mode as we did before, the file will get truncated and we will lose the data.
import mmap import io # Use "ab+" mode! if the mode is "wb+" the file will be truncated and we'll # need to seek + write + flush again fd = io.open("/var/folders/v5/jw7h15p16b929g4gkx5bml040000gn/T/tmptdqcitxl", "ab+") # Or if using a custom location: # fd = open("shared.dat", "ab+") size = 1000 m = mmap.mmap(fd.fileno(), size)
We have a new mmap object in a different process. We can now get the bytes we wrote from the other process.
print(m[:10]) # b'1111111111'
Writing to disk
Since we are working with a file, the data we write will be written to disk. By default, the Operating System will decide when to persist the data. We can use the
.flush() get some control over this. When using the
.flush() method, the
msync function runs (source). The Python implementation syncs the file with the
MS_SYNC flag (synchronous). The method acceps an
size parameters to sync only parts of the file, those are not present, the full file is synced. Take into account that this doesn’t ensure the data is written and could return errors that you need to handle. If you need stronger durability guarantees, it’s probably better that you just use SQLite to store the data.
This can be used to share data/memory between processes. Python comes with some utilities to do this more easily. But this approach can be used regardless of the programming language that the program uses. It also lets you persist the data to disk after the programs terminate (except if the memory-mapped file is stored in a temporary location, as I did in this example).
import tempfile import mmap fd = tempfile.NamedTemporaryFile() print(fd.name) size = 1000 fd.seek(size - 1) fd.write(b"1") fd.flush() m = mmap.mmap(fd.fileno(), size) m[:10] = b"1" * 10
import mmap import io # Use "ab+" mode! if the mode is "wb+" the file will be truncated and we'll # need to seek + write + flush again fd = io.open("/var/folders/v5/jw7h15p16b929g4gkx5bml040000gn/T/tmptdqcitxl", "ab+") size = 1000 m = mmap.mmap(fd.fileno(), size) print(m[:10]) # b'1111111111'