Should mention, the coolest part is that I never sent over "all" the memory used by the process, because it was difficult to tell what is needed and what isn't. Instead, I was clever with virtual memory, and when a page of memory was needed that wasn't loaded by the recipient Pi, it would request and lazy-load just that page from the provider Pi, and with some careful bookkeeping mark that the page was owned by the recipient Pi.
> Instead, I was clever with virtual memory, and when a page of memory was needed that wasn't loaded by the recipient Pi, it would request and lazy-load just that page from the provider Pi, and with some careful bookkeeping mark that the page was owned by the recipient Pi.
I wonder if that "trick" can be extended to a full implementation of distributed shared memory, i.e. multiple nodes running separate tasks in a single address space and implementing cache coherence over the network. Probably needs quite a bit of extra compiler/runtime support so it wouldn't really apply to standard binaries, but it might still be useful nonetheless.